The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models …
R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an overview of optimization algorithms and theory for training neural networks. First, we discuss …
AS Berahas, J Nocedal… - Advances in Neural …, 2016 - proceedings.neurips.cc
The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods …
In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still …
The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal …
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under …
The main results of the study of the convergence rate of quasi-Newton minimization methods were obtained under the assumption that the method operates in the region of the extremum …
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm, which has wide applications in different scenarios. Although it has been extensively investigated, SR1 …
We present a novel adaptive optimization algorithm for large-scale machine learning problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …