Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

On the lower bound of minimizing polyak-Łojasiewicz functions

P Yue, C Fang, Z Lin - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O (epsilon^(-7/4)) Complexity

H Li, Z Lin - Journal of Machine Learning Research, 2023 - jmlr.org
This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz
continuous gradient and Hessian. We propose two simple accelerated gradient methods …

No-regret dynamics in the fenchel game: A unified framework for algorithmic convex optimization

JK Wang, J Abernethy, KY Levy - Mathematical Programming, 2024 - Springer
We develop an algorithmic framework for solving convex optimization problems using no-
regret game dynamics. By converting the problem of minimizing a convex function into an …

Provable non-accelerations of the heavy-ball method

B Goujaud, A Taylor, A Dieuleveut - arXiv preprint arXiv:2307.11291, 2023 - arxiv.org
In this work, we show that the heavy-ball ($\HB $) method provably does not reach an
accelerated convergence rate on smooth strongly convex problems. More specifically, we …

Towards understanding gd with hard and conjugate pseudo-labels for test-time adaptation

JK Wang, A Wibisono - arXiv preprint arXiv:2210.10019, 2022 - arxiv.org
We consider a setting that a model needs to adapt to a new domain under distribution shifts,
given that only unlabeled test samples from the new domain are accessible at test time. A …

Invex programs: First order algorithms and their convergence

A Barik, S Sra, J Honorio - arXiv preprint arXiv:2307.04456, 2023 - arxiv.org
Invex programs are a special kind of non-convex problems which attain global minima at
every stationary point. While classical first-order gradient descent methods can solve them …

Continuized acceleration for quasar convex functions in non-convex optimization

JK Wang, A Wibisono - arXiv preprint arXiv:2302.07851, 2023 - arxiv.org
Quasar convexity is a condition that allows some first-order methods to efficiently minimize a
function even when the optimization landscape is non-convex. Previous works develop near …

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

D Oikonomou, N Loizou - arXiv preprint arXiv:2406.04142, 2024 - arxiv.org
Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method
(SHB), is one of the most popular algorithms for solving large-scale stochastic optimization …

Convergence on Thresholding-Based Algorithms for Dictionary-Sparse Recovery

Y Hong, J Lin - Journal of Fourier Analysis and Applications, 2025 - Springer
Abstract We study\(l_0\)-synthesis/analysis methods and the thresholding-based algorithms
for the dictionary-sparse recovery from a few linear measurements perturbed with Gaussian …