Dog is sgd's best friend: A parameter-free dynamic step size schedule

M Ivgi, O Hinder, Y Carmon - International Conference on …, 2023 - proceedings.mlr.press
We propose a tuning-free dynamic SGD step size formula, which we call Distance over
Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …

Acceleration methods

A d'Aspremont, D Scieur, A Taylor - Foundations and Trends® …, 2021 - nowpublishers.com
This monograph covers some recent advances in a range of acceleration techniques
frequently used in convex optimization. We first use quadratic optimization problems to …

The error-feedback framework: SGD with delayed gradients

SU Stich, SP Karimireddy - Journal of Machine Learning Research, 2020 - jmlr.org
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-
convex and non-convex functions and derive concise, non-asymptotic, convergence rates …

The error-feedback framework: Better rates for SGD with delayed gradients and compressed communication

SU Stich, SP Karimireddy - arXiv preprint arXiv:1909.05350, 2019 - arxiv.org
We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-
convex and non-convex functions and derive concise, non-asymptotic, convergence rates …

Federated minimax optimization: Improved convergence analyses and algorithms

P Sharma, R Panda, G Joshi… - … on Machine Learning, 2022 - proceedings.mlr.press
In this paper, we consider nonconvex minimax optimization, which is gaining prominence in
many modern machine learning applications, such as GANs. Large-scale edge-based …

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …

Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation

R Gower, O Sebbouh, N Loizou - … Conference on Artificial …, 2021 - proceedings.mlr.press
Abstract Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-
convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex …

Practical and matching gradient variance bounds for black-box variational Bayesian inference

K Kim, K Wu, J Oh, JR Gardner - … Conference on Machine …, 2023 - proceedings.mlr.press
Understanding the gradient variance of black-box variational inference (BBVI) is a crucial
step for establishing its convergence and developing algorithmic improvements. However …

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

C Liu, D Drusvyatskiy, M Belkin… - Advances in neural …, 2024 - proceedings.neurips.cc
Modern machine learning paradigms, such as deep learning, occur in or close to the
interpolation regime, wherein the number of model parameters is much larger than the …

Ultrasparse ultrasparsifiers and faster laplacian system solvers

A Jambulapati, A Sidford - ACM Transactions on Algorithms, 2021 - dl.acm.org
In this paper we provide an O (m loglog O (1) n log (1/ϵ))-expected time algorithm for solving
Laplacian systems on n-node m-edge graphs, improving upon the previous best expected …