Barzilai-borwein step size for stochastic gradient descent

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

被引用次数：158 相关文章所有 7 个版本

[PDF] arxiv.org

Optimization for deep learning: theory and algorithms

R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org

When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

被引用次数：239 相关文章所有 4 个版本

[PDF] jmlr.org

Adagrad stepsizes: Sharp convergence over nonconvex landscapes

R Ward, X Wu, L Bottou - Journal of Machine Learning Research, 2020 - jmlr.org

Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …

被引用次数：354 相关文章所有 10 个版本

[PDF] neurips.cc

Painless stochastic gradient: Interpolation, line-search, and convergence rates

S Vaswani, A Mishkin, I Laradji… - Advances in neural …, 2019 - proceedings.neurips.cc

Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …

被引用次数：233 相关文章所有 7 个版本

[PDF] mlr.press

Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence

N Loizou, S Vaswani, IH Laradji… - International …, 2021 - proceedings.mlr.press

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …

被引用次数：188 相关文章所有 9 个版本

[PDF] researchgate.net

Simultaneous calibration of multicoordinates for a dual-robot system by solving the AXB= YCZ problem

G Wang, W Li, C Jiang, D Zhu, H Xie… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Multirobot systems have shown great potential in dealing with complicated tasks that are
impossible for a single robot to achieve. One essential problem encountered in …

被引用次数：98 相关文章所有 3 个版本

[PDF] ieee.org

Variance-reduced decentralized stochastic optimization with accelerated convergence

R Xin, UA Khan, S Kar - IEEE Transactions on Signal …, 2020 - ieeexplore.ieee.org

This paper describes a novel algorithmic framework to minimize a finite-sum of functions
available over a network of nodes. The proposed framework, that we call GT-VR, is …

被引用次数：94 相关文章所有 8 个版本

[PDF] arxiv.org

Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient

P Latafat, A Themelis, L Stella, P Patrinos - Mathematical Programming, 2024 - Springer

Backtracking linesearch is the de facto approach for minimizing continuously differentiable
functions with locally Lipschitz gradient. In recent years, it has been shown that in the convex …

被引用次数：20 相关文章所有 4 个版本

[PDF] mlr.press

Automated inference with adaptive batches

S De, A Yadav, D Jacobs… - Artificial Intelligence and …, 2017 - proceedings.mlr.press

Classical stochastic gradient methods for optimization rely on noisy gradient approximations
that become progressively less accurate as iterates approach a solution. The large noise …

被引用次数：111 相关文章所有 7 个版本

[PDF] neurips.cc

Adaptive stochastic variance reduction for non-convex finite-sum minimization

A Kavis, S Skoulakis… - Advances in …, 2022 - proceedings.neurips.cc

We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $
L $-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider …

被引用次数：12 相关文章所有 11 个版本

高级搜索

QQ 群