Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press
Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

Momentum-based variance reduction in non-convex sgd

A Cutkosky, F Orabona - Advances in neural information …, 2019 - proceedings.neurips.cc
Variance reduction has emerged in recent years as a strong competitor to stochastic
gradient descent in non-convex problems, providing the first algorithms to improve upon the …

Katyusha: The first direct acceleration of stochastic gradient methods

Z Allen-Zhu - Journal of Machine Learning Research, 2018 - jmlr.org
Nesterov's momentum trick is famously known for accelerating gradient descent, and has
been proven useful in building fast iterative algorithms. However, in the stochastic setting …

Minimizing finite sums with the stochastic average gradient

M Schmidt, N Le Roux, F Bach - Mathematical Programming, 2017 - Springer
We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite
number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG …

A proximal stochastic gradient method with progressive variance reduction

L Xiao, T Zhang - SIAM Journal on Optimization, 2014 - SIAM
We consider the problem of minimizing the sum of two convex functions: one is the average
of a large number of smooth component functions, and the other is a general convex …

Natasha 2: Faster non-convex optimization than SGD

Z Allen-Zhu - Advances in neural information processing …, 2018 - proceedings.neurips.cc
We design a stochastic algorithm to find $\varepsilon $-approximate local minima of any
smooth nonconvex function in rate $ O (\varepsilon^{-3.25}) $, with only oracle access to …

[图书][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

Second-order stochastic optimization for machine learning in linear time

N Agarwal, B Bullins, E Hazan - Journal of Machine Learning Research, 2017 - jmlr.org
First-order stochastic methods are the state-of-the-art in large-scale machine learning
optimization owing to efficient per-iteration complexity. Second-order methods, while able to …

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems

L Luo, H Ye, Z Huang, T Zhang - Advances in Neural …, 2020 - proceedings.neurips.cc
We consider nonconvex-concave minimax optimization problems of the form $\min_ {\bf
x}\max_ {\bf y\in {\mathcal Y}} f ({\bf x},{\bf y}) $, where $ f $ is strongly-concave in $\bf y $ but …