随机梯度下降算法研究进展

史加荣, 王丹, 尚凡华, 张鹤于 - 自动化学报, 2021 - aas.net.cn
在机器学习领域中, 梯度下降算法是求解最优化问题最重要, 最基础的方法. 随着数据规模的不断
扩大, 传统的梯度下降算法已不能有效地解决大规模机器学习问题. 随机梯度下降算法在迭代 …

A survey on efficient training of transformers

B Zhuang, J Liu, Z Pan, H He, Y Weng… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in Transformers have come with a huge requirement on computing
resources, highlighting the importance of developing efficient training techniques to make …

Stochastic nested variance reduction for nonconvex optimization

D Zhou, P Xu, Q Gu - Journal of machine learning research, 2020 - jmlr.org
We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …

Spiderboost and momentum: Faster variance reduction algorithms

Z Wang, K Ji, Y Zhou, Y Liang… - Advances in Neural …, 2019 - proceedings.neurips.cc
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms,
and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in …

∇-prox: Differentiable proximal algorithm modeling for large-scale optimization

Z Lai, K Wei, Y Fu, P Härtel, F Heide - ACM Transactions on Graphics …, 2023 - dl.acm.org
Tasks across diverse application domains can be posed as large-scale optimization
problems, these include graphics, vision, machine learning, imaging, health, scheduling …

A simple proximal stochastic gradient method for nonsmooth nonconvex optimization

Z Li, J Li - Advances in neural information processing …, 2018 - proceedings.neurips.cc
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum
problems. In particular, the objective function is given by the summation of a differentiable …

Sharp analysis of stochastic optimization under global Kurdyka-Lojasiewicz inequality

I Fatkhullin, J Etesami, N He… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the complexity of finding the global solution to stochastic nonconvex optimization
when the objective function satisfies global Kurdyka-{\L} ojasiewicz (KL) inequality and the …

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O (epsilon^(-7/4)) Complexity

H Li, Z Lin - Journal of Machine Learning Research, 2023 - jmlr.org
This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz
continuous gradient and Hessian. We propose two simple accelerated gradient methods …

On the convergence of learning-based iterative methods for nonconvex inverse problems

R Liu, S Cheng, Y He, X Fan, Z Lin… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill-
posed inverse problems. Recently, learning-based (eg, deep) iterative methods have been …

Sgd converges to global minimum in deep learning via star-convex path

Y Zhou, J Yang, H Zhang, Y Liang, V Tarokh - arXiv preprint arXiv …, 2019 - arxiv.org
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a
variety of deep neural networks. However, there is still a lack of understanding on how and …