In typical artificial neural networks, neurons adjust according to global calculations of a central processor, but in the brain, neurons and synapses self-adjust based on local …
L Wu, WJ Su - International Conference on Machine …, 2023 - proceedings.mlr.press
In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of dynamical stability (Wu et al., 2018). We start by revising existing stability …
H Levine, Y Tu - Proceedings of the National Academy of Sciences, 2024 - pnas.org
This article introduces a special issue on the interaction between the rapidly expanding field of machine learning and ongoing research in physics. The first half of the papers in this …
M Wang, C Ma - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant …
L Wu, M Wang, W Su - Advances in Neural Information …, 2022 - proceedings.neurips.cc
The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an …
S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning. The noise encountered in these applications is different from that in many …
C Ma, L Ying - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
The multiplicative structure of parameters and input data in the first layer of neural networks is explored to build connection between the landscape of the loss function with respect to …
A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences
Modern deep networks are trained with stochastic gradient descent (SGD) whose key hyperparameters are the number of data considered at each step or batch size B, and the …
F Mignacco, P Urbani - Journal of Statistical Mechanics: Theory …, 2022 - iopscience.iop.org
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset …