Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make …
D Zhou, P Xu, Q Gu - Journal of machine learning research, 2020 - jmlr.org
We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We …
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in …
Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling …
Z Li, J Li - Advances in neural information processing …, 2018 - proceedings.neurips.cc
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable …
We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-{\L} ojasiewicz (KL) inequality and the …
H Li, Z Lin - Journal of Machine Learning Research, 2023 - jmlr.org
This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods …
Numerous tasks at the core of statistics, learning and vision areas are specific cases of ill- posed inverse problems. Recently, learning-based (eg, deep) iterative methods have been …
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and …