PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

Enhancing sharpness-aware optimization through variance suppression

B Li, G Giannakis - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Sharpness-aware minimization (SAM) has well documented merits in enhancing
generalization of deep neural networks, even without sizable data augmentation. Embracing …

Stochastic distributed optimization under average second-order similarity: Algorithms and analysis

D Lin, Y Han, H Ye, Z Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We study finite-sum distributed optimization problems involving a master node and $ n-1$
local nodes under the popular $\delta $-similarity and $\mu $-strong convexity conditions …

Lower complexity bounds of finite-sum optimization problems: The results and construction

Y Han, G Xie, Z Zhang - Journal of Machine Learning Research, 2024 - jmlr.org
In this paper we study the lower complexity bounds for finite-sum optimization problems,
where the objective is the average of $ n $ individual component functions. We consider a …

A stochastic proximal alternating minimization for nonsmooth and nonconvex optimization

D Driggs, J Tang, J Liang, M Davies… - SIAM Journal on Imaging …, 2021 - SIAM
In this work, we introduce a novel stochastic proximal alternating linearized minimization
algorithm [J. Bolte, S. Sabach, and M. Teboulle, Math. Program., 146 (2014), pp. 459--494] …

Decentralized TD tracking with linear function approximation and its finite-time analysis

G Wang, S Lu, G Giannakis… - Advances in neural …, 2020 - proceedings.neurips.cc
The present contribution deals with decentralized policy evaluation in multi-agent Markov
decision processes using temporal-difference (TD) methods with linear function …

Faster federated optimization under second-order similarity

A Khaled, C Jin - arXiv preprint arXiv:2209.02257, 2022 - arxiv.org
Federated learning (FL) is a subfield of machine learning where multiple clients try to
collaboratively learn a model over a network under communication constraints. We consider …

Almost tune-free variance reduction

B Li, L Wang, GB Giannakis - International conference on …, 2020 - proceedings.mlr.press
The variance reduction class of algorithms including the representative ones, SVRG and
SARAH, have well documented merits for empirical risk minimization problems. However …

A stochastic two-step inertial Bregman proximal alternating linearized minimization algorithm for nonconvex and nonsmooth problems

C Guo, J Zhao, QL Dong - Numerical Algorithms, 2024 - Springer
In this paper, for solving a broad class of large-scale nonconvex and nonsmooth
optimization problems, we propose a stochastic two-step inertial Bregman proximal …

Practical schemes for finding near-stationary points of convex finite-sums

K Zhou, L Tian, AMC So… - … Conference on Artificial …, 2022 - proceedings.mlr.press
In convex optimization, the problem of finding near-stationary points has not been
adequately studied yet, unlike other optimality measures such as the function value. Even in …