Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Optimization for deep learning: theory and algorithms

R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

Adagrad stepsizes: Sharp convergence over nonconvex landscapes

R Ward, X Wu, L Bottou - Journal of Machine Learning Research, 2020 - jmlr.org
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …

Painless stochastic gradient: Interpolation, line-search, and convergence rates

S Vaswani, A Mishkin, I Laradji… - Advances in neural …, 2019 - proceedings.neurips.cc
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …

Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence

N Loizou, S Vaswani, IH Laradji… - International …, 2021 - proceedings.mlr.press
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …

Simultaneous calibration of multicoordinates for a dual-robot system by solving the AXB= YCZ problem

G Wang, W Li, C Jiang, D Zhu, H Xie… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Multirobot systems have shown great potential in dealing with complicated tasks that are
impossible for a single robot to achieve. One essential problem encountered in …

Variance-reduced decentralized stochastic optimization with accelerated convergence

R Xin, UA Khan, S Kar - IEEE Transactions on Signal …, 2020 - ieeexplore.ieee.org
This paper describes a novel algorithmic framework to minimize a finite-sum of functions
available over a network of nodes. The proposed framework, that we call GT-VR, is …

Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient

P Latafat, A Themelis, L Stella, P Patrinos - Mathematical Programming, 2024 - Springer
Backtracking linesearch is the de facto approach for minimizing continuously differentiable
functions with locally Lipschitz gradient. In recent years, it has been shown that in the convex …

Automated inference with adaptive batches

S De, A Yadav, D Jacobs… - Artificial Intelligence and …, 2017 - proceedings.mlr.press
Classical stochastic gradient methods for optimization rely on noisy gradient approximations
that become progressively less accurate as iterates approach a solution. The large noise …

Adaptive stochastic variance reduction for non-convex finite-sum minimization

A Kavis, S Skoulakis… - Advances in …, 2022 - proceedings.neurips.cc
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $
L $-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider …