[HTML][HTML] Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

J Zhuang, T Tang, Y Ding… - Advances in neural …, 2020 - proceedings.neurips.cc
Most popular optimizers for deep learning can be broadly categorized as adaptive methods
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

K Sohn, D Berthelot, N Carlini… - Advances in neural …, 2020 - proceedings.neurips.cc
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data
to improve a model's performance. This domain has seen fast progress recently, at the cost …

[HTML][HTML] A modified Adam algorithm for deep neural network optimization

M Reyad, AM Sarhan, M Arafa - Neural Computing and Applications, 2023 - Springer
Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning
tool for dealing with large datasets, and they have been successfully used in thousands of …

On the variance of the adaptive learning rate and beyond

L Liu, H Jiang, P He, W Chen, X Liu, J Gao… - arXiv preprint arXiv …, 2019 - arxiv.org
The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …

High-frequency component helps explain the generalization of convolutional neural networks

H Wang, X Wu, Z Huang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We investigate the relationship between the frequency spectrum of image data and the
generalization behavior of convolutional neural networks (CNN). We first notice CNN's …

Activated gradients for deep neural networks

M Liu, L Chen, X Du, L Jin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks often suffer from poor performance or even training failure due to the
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …

Communication-efficient adaptive federated learning

Y Wang, L Lin, J Chen - International Conference on …, 2022 - proceedings.mlr.press
Federated learning is a machine learning training paradigm that enables clients to jointly
train models without sharing their own localized data. However, the implementation of …

Towards explaining the regularization effect of initial large learning rate in training neural networks

Y Li, C Wei, T Ma - Advances in neural information …, 2019 - proceedings.neurips.cc
Stochastic gradient descent with a large initial learning rate is widely used for training
modern neural net architectures. Although a small initial learning rate allows for faster …