Stochastic training is not necessary for generalization

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

被引用次数：61 相关文章所有 7 个版本

[PDF] neurips.cc

PAC-Bayes compression bounds so tight that they can explain generalization

S Lotfi, M Finzi, S Kapoor… - Advances in …, 2022 - proceedings.neurips.cc

While there has been progress in developing non-vacuous generalization bounds for deep
neural networks, these bounds tend to be uninformative about why deep learning works. In …

被引用次数：49 相关文章所有 7 个版本

[PDF] neurips.cc

When do flat minima optimizers work?

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

被引用次数：79 相关文章所有 7 个版本

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

被引用次数：13 相关文章所有 3 个版本

[PDF] thecvf.com

Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective

G Somepalli, L Fowl, A Bansal… - Proceedings of the …, 2022 - openaccess.thecvf.com

We discuss methods for visualizing neural network decision boundaries and decision
regions. We use these visualizations to investigate issues related to reproducibility and …

被引用次数：71 相关文章所有 8 个版本

[PDF] thecvf.com

Subspace adversarial training

T Li, Y Wu, S Chen, K Fang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Single-step adversarial training (AT) has received wide attention as it proved to be both
efficient and robust. However, a serious problem of catastrophic overfitting exists, ie, the …

被引用次数：78 相关文章所有 9 个版本

[PDF] neurips.cc

Stochastic collapse: How gradient noise attracts sgd dynamics towards simpler subnetworks

F Chen, D Kunin, A Yamamura… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives
overly expressive networks to much simpler subnetworks, thereby dramatically reducing the …

被引用次数：25 相关文章所有 6 个版本

[PDF] neurips.cc

Why neural networks find simple solutions: The many regularizers of geometric complexity

B Dherin, M Munn, M Rosca… - Advances in Neural …, 2022 - proceedings.neurips.cc

In many contexts, simpler models are preferable to more complex models and the control of
this model complexity is the goal for many methods in machine learning such as …

被引用次数：35 相关文章所有 7 个版本

[PDF] arxiv.org

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

F Kunstner, J Chen, JW Lavington… - arXiv preprint arXiv …, 2023 - arxiv.org

The success of the Adam optimizer on a wide array of architectures has made it the default
in settings where stochastic gradient descent (SGD) performs poorly. However, our …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

被引用次数：33 相关文章所有 7 个版本

高级搜索

QQ 群