- 学术资源搜索

Dog is sgd's best friend: A parameter-free dynamic step size schedule

M Ivgi, O Hinder, Y Carmon - International Conference on …, 2023 - proceedings.mlr.press

We propose a tuning-free dynamic SGD step size formula, which we call Distance over
Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …

被引用次数：56 相关文章所有 8 个版本

[PDF] nowpublishers.com

Acceleration methods

A d'Aspremont, D Scieur, A Taylor - Foundations and Trends® …, 2021 - nowpublishers.com

This monograph covers some recent advances in a range of acceleration techniques
frequently used in convex optimization. We first use quadratic optimization problems to …

被引用次数：163 相关文章所有 8 个版本

[PDF] jmlr.org

The error-feedback framework: SGD with delayed gradients

SU Stich, SP Karimireddy - Journal of Machine Learning Research, 2020 - jmlr.org

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-
convex and non-convex functions and derive concise, non-asymptotic, convergence rates …

被引用次数：153 相关文章所有 4 个版本

[PDF] arxiv.org

The error-feedback framework: Better rates for SGD with delayed gradients and compressed communication

SU Stich, SP Karimireddy - arXiv preprint arXiv:1909.05350, 2019 - arxiv.org

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-
convex and non-convex functions and derive concise, non-asymptotic, convergence rates …

被引用次数：153 相关文章所有 2 个版本

[PDF] mlr.press

Federated minimax optimization: Improved convergence analyses and algorithms

P Sharma, R Panda, G Joshi… - … on Machine Learning, 2022 - proceedings.mlr.press

In this paper, we consider nonconvex minimax optimization, which is gaining prominence in
many modern machine learning applications, such as GANs. Large-scale edge-based …

被引用次数：50 相关文章所有 4 个版本

[PDF] arxiv.org

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer

Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …

被引用次数：104 相关文章所有 11 个版本

[PDF] mlr.press

Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation

R Gower, O Sebbouh, N Loizou - … Conference on Artificial …, 2021 - proceedings.mlr.press

Abstract Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-
convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex …

被引用次数：88 相关文章所有 9 个版本

[PDF] mlr.press

Practical and matching gradient variance bounds for black-box variational Bayesian inference

K Kim, K Wu, J Oh, JR Gardner - … Conference on Machine …, 2023 - proceedings.mlr.press

Understanding the gradient variance of black-box variational inference (BBVI) is a crucial
step for establishing its convergence and developing algorithmic improvements. However …

被引用次数：7 相关文章所有 8 个版本

[PDF] neurips.cc

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

C Liu, D Drusvyatskiy, M Belkin… - Advances in neural …, 2024 - proceedings.neurips.cc

Modern machine learning paradigms, such as deep learning, occur in or close to the
interpolation regime, wherein the number of model parameters is much larger than the …

被引用次数：12 相关文章所有 8 个版本

[PDF] acm.org

Ultrasparse ultrasparsifiers and faster laplacian system solvers

A Jambulapati, A Sidford - ACM Transactions on Algorithms, 2021 - dl.acm.org

In this paper we provide an O (m loglog O (1) n log (1/ϵ))-expected time algorithm for solving
Laplacian systems on n-node m-edge graphs, improving upon the previous best expected …

被引用次数：43 相关文章所有 6 个版本

高级搜索

QQ 群

Dog is sgd's best friend: A parameter-free dynamic step size schedule

Acceleration methods

The error-feedback framework: SGD with delayed gradients

The error-feedback framework: Better rates for SGD with delayed gradients and compressed communication

Federated minimax optimization: Improved convergence analyses and algorithms

Recent theoretical advances in non-convex optimization

Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation

Practical and matching gradient variance bounds for black-box variational Bayesian inference

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Ultrasparse ultrasparsifiers and faster laplacian system solvers

引用