We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …
A Khaled, K Mishchenko, C Jin - Advances in Neural …, 2023 - proceedings.neurips.cc
This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …
We introduce a technique for tuning the learning rate scale factor of any base optimization algorithm and schedule automatically, which we call Mechanic. Our method provides a …
In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on …
Y Carmon, O Hinder - arXiv preprint arXiv:2402.10898, 2024 - arxiv.org
We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a" price of adaptivity"(PoA) …
T Li, G Lan - arXiv preprint arXiv:2310.10082, 2023 - arxiv.org
Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem …
P Latafat, A Themelis… - 6th Annual Learning for …, 2024 - proceedings.mlr.press
Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes AdaPG, a framework that unifies and extends existing results by providing …
Z Liu, Z Zhou - arXiv preprint arXiv:2312.08531, 2023 - arxiv.org
In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in …
Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …