- 学术资源搜索

[HTML][HTML] Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

被引用次数：70 相关文章所有 5 个版本

[PDF] neurips.cc

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

J Zhuang, T Tang, Y Ding… - Advances in neural …, 2020 - proceedings.neurips.cc

Most popular optimizers for deep learning can be broadly categorized as adaptive methods
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …

被引用次数：601 相关文章所有 7 个版本

[PDF] arxiv.org

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X Xie, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

被引用次数：114 相关文章所有 4 个版本

[PDF] neurips.cc

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

K Sohn, D Berthelot, N Carlini… - Advances in neural …, 2020 - proceedings.neurips.cc

Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data
to improve a model's performance. This domain has seen fast progress recently, at the cost …

被引用次数：3715 相关文章所有 14 个版本

[HTML] springer.com

[HTML][HTML] A modified Adam algorithm for deep neural network optimization

M Reyad, AM Sarhan, M Arafa - Neural Computing and Applications, 2023 - Springer

Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning
tool for dealing with large datasets, and they have been successfully used in thousands of …

被引用次数：103 相关文章所有 4 个版本

[PDF] arxiv.org

On the variance of the adaptive learning rate and beyond

L Liu, H Jiang, P He, W Chen, X Liu, J Gao… - arXiv preprint arXiv …, 2019 - arxiv.org

The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …

被引用次数：2234 相关文章所有 5 个版本

[PDF] thecvf.com

High-frequency component helps explain the generalization of convolutional neural networks

H Wang, X Wu, Z Huang… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

We investigate the relationship between the frequency spectrum of image data and the
generalization behavior of convolutional neural networks (CNN). We first notice CNN's …

被引用次数：589 相关文章所有 13 个版本

[PDF] arxiv.org

Activated gradients for deep neural networks

M Liu, L Chen, X Du, L Jin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Deep neural networks often suffer from poor performance or even training failure due to the
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …

被引用次数：159 相关文章所有 7 个版本

[PDF] mlr.press

Communication-efficient adaptive federated learning

Y Wang, L Lin, J Chen - International Conference on …, 2022 - proceedings.mlr.press

Federated learning is a machine learning training paradigm that enables clients to jointly
train models without sharing their own localized data. However, the implementation of …

被引用次数：76 相关文章所有 4 个版本

[PDF] neurips.cc

Towards explaining the regularization effect of initial large learning rate in training neural networks

Y Li, C Wei, T Ma - Advances in neural information …, 2019 - proceedings.neurips.cc

Stochastic gradient descent with a large initial learning rate is widely used for training
modern neural net architectures. Although a small initial learning rate allows for faster …

被引用次数：346 相关文章所有 7 个版本

高级搜索

QQ 群

[HTML][HTML] Recent advances in stochastic gradient descent in deep learning

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

Fixmatch: Simplifying semi-supervised learning with consistency and confidence

[HTML][HTML] A modified Adam algorithm for deep neural network optimization

On the variance of the adaptive learning rate and beyond

High-frequency component helps explain the generalization of convolutional neural networks

Activated gradients for deep neural networks

Communication-efficient adaptive federated learning

Towards explaining the regularization effect of initial large learning rate in training neural networks

引用