Provable acceleration of heavy ball beyond quadratics for a class of Polyak-Lojasiewicz functions...

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org

Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

被引用次数：48 相关文章所有 4 个版本

[PDF] mlr.press

On the lower bound of minimizing polyak-Łojasiewicz functions

P Yue, C Fang, Z Lin - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press

Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …

被引用次数：37 相关文章所有 5 个版本

[PDF] jmlr.org

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O (epsilon^(-7/4)) Complexity

H Li, Z Lin - Journal of Machine Learning Research, 2023 - jmlr.org

This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz
continuous gradient and Hessian. We propose two simple accelerated gradient methods …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

No-regret dynamics in the fenchel game: A unified framework for algorithmic convex optimization

JK Wang, J Abernethy, KY Levy - Mathematical Programming, 2024 - Springer

We develop an algorithmic framework for solving convex optimization problems using no-
regret game dynamics. By converting the problem of minimizing a convex function into an …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

Provable non-accelerations of the heavy-ball method

B Goujaud, A Taylor, A Dieuleveut - arXiv preprint arXiv:2307.11291, 2023 - arxiv.org

In this work, we show that the heavy-ball ($\HB $) method provably does not reach an
accelerated convergence rate on smooth strongly convex problems. More specifically, we …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Towards understanding gd with hard and conjugate pseudo-labels for test-time adaptation

JK Wang, A Wibisono - arXiv preprint arXiv:2210.10019, 2022 - arxiv.org

We consider a setting that a model needs to adapt to a new domain under distribution shifts,
given that only unlabeled test samples from the new domain are accessible at test time. A …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Invex programs: First order algorithms and their convergence

A Barik, S Sra, J Honorio - arXiv preprint arXiv:2307.04456, 2023 - arxiv.org

Invex programs are a special kind of non-convex problems which attain global minima at
every stationary point. While classical first-order gradient descent methods can solve them …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Continuized acceleration for quasar convex functions in non-convex optimization

JK Wang, A Wibisono - arXiv preprint arXiv:2302.07851, 2023 - arxiv.org

Quasar convexity is a condition that allows some first-order methods to efficiently minimize a
function even when the optimization landscape is non-convex. Previous works develop near …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

D Oikonomou, N Loizou - arXiv preprint arXiv:2406.04142, 2024 - arxiv.org

Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method
(SHB), is one of the most popular algorithms for solving large-scale stochastic optimization …

被引用次数：3 相关文章所有 2 个版本

Convergence on Thresholding-Based Algorithms for Dictionary-Sparse Recovery

Y Hong, J Lin - Journal of Fourier Analysis and Applications, 2025 - Springer

Abstract We study$l_0$-synthesis/analysis methods and the thresholding-based algorithms
for the dictionary-sparse recovery from a few linear measurements perturbed with Gaussian …

高级搜索

QQ 群