Learning-rate annealing methods for deep neural networks

K Nakamura, B Derbel, KJ Won, BW Hong - Electronics, 2021 - mdpi.com
Deep neural networks (DNNs) have achieved great success in the last decades. DNN is
optimized using the stochastic gradient descent (SGD) with learning rate annealing that …

Direct acceleration of SAGA using sampled negative momentum

K Zhou, Q Ding, F Shang, J Cheng… - The 22nd …, 2019 - proceedings.mlr.press
Variance reduction is a simple and effective technique that accelerates convex (or non-
convex) stochastic optimization. Among existing variance reduction methods, SVRG and …

A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization

M Yang, A Milzarek, Z Wen, T Zhang - Mathematical Programming, 2022 - Springer
In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a
class of nonsmooth nonconvex composite optimization problems. We assume that the …

Stochastic batch size for adaptive regularization in deep network optimization

K Nakamura, S Soatto, BW Hong - Pattern Recognition, 2022 - Elsevier
We propose a first-order stochastic optimization algorithm incorporating adaptive
regularization for pattern recognition problems in deep learning framework. The adaptive …

Accelerating SGD using flexible variance reduction on large-scale datasets

M Tang, L Qiao, Z Huang, X Liu, Y Peng… - Neural Computing and …, 2020 - Springer
Stochastic gradient descent (SGD) is a popular optimization method widely used in machine
learning, while the variance of gradient estimation leads to slow convergence. To accelerate …

[PDF][PDF] Learning-Rate Annealing Methods for Deep Neural Networks. Electronics 2021, 10, 2029

K Nakamura, B Derbel, KJ Won, BW Hong - 2021 - pdfs.semanticscholar.org
Deep neural networks (DNNs) have achieved great success in the last decades. DNN is
optimized using the stochastic gradient descent (SGD) with learning rate annealing that …

[PDF][PDF] Accelerating Finite-sum Convex Optimization and Highly-smooth Convex Optimization

K ZHOU - 2019 - jnhujnhu.github.io
In the last half-century, a great amount of literature has been devoted to convex optimization.
If no specific problem structure is assumed other than convexity, very strong results are …