Linear convergence with condition number independent access of full gradients

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org

Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

被引用次数：127 相关文章所有 14 个版本

[PDF] mlr.press

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press

Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

被引用次数：2797 相关文章所有 7 个版本

[PDF] neurips.cc

Momentum-based variance reduction in non-convex sgd

A Cutkosky, F Orabona - Advances in neural information …, 2019 - proceedings.neurips.cc

Variance reduction has emerged in recent years as a strong competitor to stochastic
gradient descent in non-convex problems, providing the first algorithms to improve upon the …

被引用次数：395 相关文章所有 10 个版本

[PDF] jmlr.org Full View

Katyusha: The first direct acceleration of stochastic gradient methods

Z Allen-Zhu - Journal of Machine Learning Research, 2018 - jmlr.org

Nesterov's momentum trick is famously known for accelerating gradient descent, and has
been proven useful in building fast iterative algorithms. However, in the stochastic setting …

被引用次数：682 相关文章所有 9 个版本

[PDF] arxiv.org

Minimizing finite sums with the stochastic average gradient

M Schmidt, N Le Roux, F Bach - Mathematical Programming, 2017 - Springer

We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite
number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG …

被引用次数：1441 相关文章所有 23 个版本

[PDF] arxiv.org

A proximal stochastic gradient method with progressive variance reduction

L Xiao, T Zhang - SIAM Journal on Optimization, 2014 - SIAM

We consider the problem of minimizing the sum of two convex functions: one is the average
of a large number of smooth component functions, and the other is a general convex …

被引用次数：853 相关文章所有 14 个版本

[PDF] neurips.cc

Natasha 2: Faster non-convex optimization than SGD

Z Allen-Zhu - Advances in neural information processing …, 2018 - proceedings.neurips.cc

We design a stochastic algorithm to find $\varepsilon $-approximate local minima of any
smooth nonconvex function in rate $ O (\varepsilon^{-3.25}) $, with only oracle access to …

被引用次数：263 相关文章所有 6 个版本

[PDF] mit.edu

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com

An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

被引用次数：1012 相关文章所有 33 个版本

[PDF] jmlr.org

Second-order stochastic optimization for machine learning in linear time

N Agarwal, B Bullins, E Hazan - Journal of Machine Learning Research, 2017 - jmlr.org

First-order stochastic methods are the state-of-the-art in large-scale machine learning
optimization owing to efficient per-iteration complexity. Second-order methods, while able to …

被引用次数：278 相关文章所有 8 个版本

[PDF] neurips.cc

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems

L Luo, H Ye, Z Huang, T Zhang - Advances in Neural …, 2020 - proceedings.neurips.cc

We consider nonconvex-concave minimax optimization problems of the form $\min_ {\bf
x}\max_ {\bf y\in {\mathcal Y}} f ({\bf x},{\bf y}) $, where $ f $ is strongly-concave in $\bf y $ but …

被引用次数：123 相关文章所有 8 个版本

高级搜索

QQ 群