Geometry of optimization and implicit regularization in deep learning

B Neyshabur, R Tomioka, R Salakhutdinov… - arXiv preprint arXiv …, 2017 - arxiv.org
We argue that the optimization plays a crucial role in generalization of deep learning models
through implicit regularization. We do this by demonstrating that generalization ability is not …

Bayesian interpolation with deep linear networks

B Hanin, A Zlokapa - … of the National Academy of Sciences, 2023 - National Acad Sciences
Characterizing how neural network depth, width, and dataset size jointly impact model
quality is a central problem in deep learning theory. We give here a complete solution in the …

Generalization in deep networks: The role of distance from initialization

V Nagarajan, JZ Kolter - arXiv preprint arXiv:1901.01672, 2019 - arxiv.org
Why does training deep neural networks using stochastic gradient descent (SGD) result in a
generalization error that does not worsen with the number of parameters in the network? To …

Learning and generalization in overparameterized neural networks, going beyond two layers

Z Allen-Zhu, Y Li, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

A note on over-smoothing for graph neural networks

C Cai, Y Wang - arXiv preprint arXiv:2006.13318, 2020 - arxiv.org
Graph Neural Networks (GNNs) have achieved a lot of success on graph-structured data.
However, it is observed that the performance of graph neural networks does not improve as …

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org
The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

Extrapolation and learning equations

G Martius, CH Lampert - arXiv preprint arXiv:1610.02995, 2016 - arxiv.org
In classical machine learning, regression is treated as a black box process of identifying a
suitable function from a hypothesis set without attempting to gain insight into the mechanism …

From tempered to benign overfitting in relu neural networks

G Kornowski, G Yehudai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Overparameterized neural networks (NNs) are observed to generalize well even when
trained to perfectly fit noisy data. This phenomenon motivated a large body of work on" …

Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization

K Wen, Z Li, T Ma - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Despite extensive studies, the underlying reason as to why overparameterizedneural
networks can generalize remains elusive. Existing theory shows that common stochastic …