Convergence analysis of proximal gradient with momentum for nonconvex optimization

史加荣，王丹，尚凡华，张鹤于 - 自动化学报, 2021 - aas.net.cn

在机器学习领域中, 梯度下降算法是求解最优化问题最重要, 最基础的方法. 随着数据规模的不断
扩大, 传统的梯度下降算法已不能有效地解决大规模机器学习问题. 随机梯度下降算法在迭代 …

A survey on efficient training of transformers

B Zhuang, J Liu, Z Pan, H He, Y Weng… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in Transformers have come with a huge requirement on computing
resources, highlighting the importance of developing efficient training techniques to make …

被引用次数：55 相关文章所有 5 个版本

[PDF] jmlr.org

Stochastic nested variance reduction for nonconvex optimization

D Zhou, P Xu, Q Gu - Journal of machine learning research, 2020 - jmlr.org

We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …

被引用次数：224 相关文章所有 17 个版本

[PDF] neurips.cc

Spiderboost and momentum: Faster variance reduction algorithms

Z Wang, K Ji, Y Zhou, Y Liang… - Advances in Neural …, 2019 - proceedings.neurips.cc

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms,
and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in …

被引用次数：195 相关文章所有 8 个版本

[PDF] researchgate.net

∇-prox: Differentiable proximal algorithm modeling for large-scale optimization

Z Lai, K Wei, Y Fu, P Härtel, F Heide - ACM Transactions on Graphics …, 2023 - dl.acm.org

Tasks across diverse application domains can be posed as large-scale optimization
problems, these include graphics, vision, machine learning, imaging, health, scheduling …

被引用次数：10 相关文章所有 6 个版本

[PDF] neurips.cc

A simple proximal stochastic gradient method for nonsmooth nonconvex optimization

Z Li, J Li - Advances in neural information processing …, 2018 - proceedings.neurips.cc

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum
problems. In particular, the objective function is given by the summation of a differentiable …

被引用次数：125 相关文章所有 12 个版本

[PDF] neurips.cc

Sharp analysis of stochastic optimization under global Kurdyka-Lojasiewicz inequality

I Fatkhullin, J Etesami, N He… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the complexity of finding the global solution to stochastic nonconvex optimization
when the objective function satisfies global Kurdyka-{\L} ojasiewicz (KL) inequality and the …

被引用次数：25 相关文章所有 7 个版本

[PDF] jmlr.org

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the in the O (epsilon^(-7/4)) Complexity

H Li, Z Lin - Journal of Machine Learning Research, 2023 - jmlr.org

This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz
continuous gradient and Hessian. We propose two simple accelerated gradient methods …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

On the convergence of learning-based iterative methods for nonconvex inverse problems

R Liu, S Cheng, Y He, X Fan, Z Lin… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill-
posed inverse problems. Recently, learning-based (eg, deep) iterative methods have been …

被引用次数：77 相关文章所有 7 个版本

[PDF] arxiv.org

Sgd converges to global minimum in deep learning via star-convex path

Y Zhou, J Yang, H Zhang, Y Liang, V Tarokh - arXiv preprint arXiv …, 2019 - arxiv.org

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a
variety of deep neural networks. However, there is still a lack of understanding on how and …

被引用次数：79 相关文章所有 4 个版本

高级搜索

QQ 群