M Ivgi, O Hinder, Y Carmon - International Conference on …, 2023 - proceedings.mlr.press
We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …
A Attia, T Koren - International Conference on Machine …, 2023 - proceedings.mlr.press
Abstract We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied …
J Zhang, A Cutkosky - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed …
We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …
A Khaled, K Mishchenko, C Jin - Advances in Neural …, 2023 - proceedings.neurips.cc
This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call Mechanic. Our method provides a …
J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …
Y Carmon, O Hinder - arXiv preprint arXiv:2402.10898, 2024 - arxiv.org
We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a" price of adaptivity"(PoA) …
T Li, G Lan - arXiv preprint arXiv:2310.10082, 2023 - arxiv.org
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem …