Making sgd parameter-free

A Defazio, K Mishchenko - International Conference on …, 2023 - proceedings.mlr.press

The speed of gradient descent for convex Lipschitz functions is highly dependent on the
choice of learning rate. Setting the learning rate to achieve the optimal convergence rate …

被引用次数：79 相关文章所有 6 个版本

[PDF] mlr.press

Dog is sgd's best friend: A parameter-free dynamic step size schedule

M Ivgi, O Hinder, Y Carmon - International Conference on …, 2023 - proceedings.mlr.press

We propose a tuning-free dynamic SGD step size formula, which we call Distance over
Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …

被引用次数：55 相关文章所有 8 个版本

[PDF] mlr.press

SGD with AdaGrad stepsizes: Full adaptivity with high probability to unknown parameters, unbounded gradients and affine variance

A Attia, T Koren - International Conference on Machine …, 2023 - proceedings.mlr.press

Abstract We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive
(self-tuning) method for first-order stochastic optimization. Despite being well studied …

被引用次数：20 相关文章所有 6 个版本

[PDF] neurips.cc

Parameter-free regret in high probability with heavy tails

J Zhang, A Cutkosky - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We present new algorithms for online convex optimization over unbounded domains that
obtain parameter-free regret in high-probability given access only to potentially heavy-tailed …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Prodigy: An expeditiously adaptive parameter-free learner

K Mishchenko, A Defazio - arXiv preprint arXiv:2306.06101, 2023 - arxiv.org

We consider the problem of estimating the learning rate in adaptive methods, such as
AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …

被引用次数：34 相关文章所有 4 个版本

[PDF] neurips.cc

Dowg unleashed: An efficient universal parameter-free gradient descent method

A Khaled, K Mishchenko, C Jin - Advances in Neural …, 2023 - proceedings.neurips.cc

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer:
DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …

被引用次数：20 相关文章所有 8 个版本

[PDF] neurips.cc

Mechanic: A learning rate tuner

A Cutkosky, A Defazio, H Mehta - Advances in Neural …, 2024 - proceedings.neurips.cc

We introduce a technique for tuning the learning rate scale factor of any base optimization
algorithm and schedule automatically, which we call Mechanic. Our method provides a …

被引用次数：13 相关文章所有 6 个版本

[PDF] neurips.cc

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

The price of adaptivity in stochastic convex optimization

Y Carmon, O Hinder - arXiv preprint arXiv:2402.10898, 2024 - arxiv.org

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization.
Given a set of problem parameters we wish to adapt to, we define a" price of adaptivity"(PoA) …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

A simple uniformly optimal method without line search for convex optimization

T Li, G Lan - arXiv preprint arXiv:2310.10082, 2023 - arxiv.org

Line search (or backtracking) procedures have been widely employed into first-order
methods for solving convex optimization problems, especially those with unknown problem …

被引用次数：16 相关文章所有 2 个版本

高级搜索

QQ 群

Learning-rate-free learning by d-adaptation

Dog is sgd's best friend: A parameter-free dynamic step size schedule

SGD with AdaGrad stepsizes: Full adaptivity with high probability to unknown parameters, unbounded gradients and affine variance

Parameter-free regret in high probability with heavy tails

Prodigy: An expeditiously adaptive parameter-free learner

Dowg unleashed: An efficient universal parameter-free gradient descent method

Mechanic: A learning rate tuner

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

The price of adaptivity in stochastic convex optimization

A simple uniformly optimal method without line search for convex optimization

引用