相关文章- 学术资源搜索

Geometry of optimization and implicit regularization in deep learning

B Neyshabur, R Tomioka, R Salakhutdinov… - arXiv preprint arXiv …, 2017 - arxiv.org

We argue that the optimization plays a crucial role in generalization of deep learning models
through implicit regularization. We do this by demonstrating that generalization ability is not …

被引用次数：157 相关文章所有 3 个版本

[PDF] pnas.org Full View

Bayesian interpolation with deep linear networks

B Hanin, A Zlokapa - … of the National Academy of Sciences, 2023 - National Acad Sciences

Characterizing how neural network depth, width, and dataset size jointly impact model
quality is a central problem in deep learning theory. We give here a complete solution in the …

被引用次数：18 相关文章所有 10 个版本

[PDF] arxiv.org

Generalization in deep networks: The role of distance from initialization

V Nagarajan, JZ Kolter - arXiv preprint arXiv:1901.01672, 2019 - arxiv.org

Why does training deep neural networks using stochastic gradient descent (SGD) result in a
generalization error that does not worsen with the number of parameters in the network? To …

被引用次数：96 相关文章所有 4 个版本

[PDF] neurips.cc

Learning and generalization in overparameterized neural networks, going beyond two layers

Z Allen-Zhu, Y Li, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …

被引用次数：833 相关文章所有 12 个版本

[PDF] arxiv.org

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org

Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

A note on over-smoothing for graph neural networks

C Cai, Y Wang - arXiv preprint arXiv:2006.13318, 2020 - arxiv.org

Graph Neural Networks (GNNs) have achieved a lot of success on graph-structured data.
However, it is observed that the performance of graph neural networks does not improve as …

被引用次数：233 相关文章所有 3 个版本

[PDF] arxiv.org

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org

The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Extrapolation and learning equations

G Martius, CH Lampert - arXiv preprint arXiv:1610.02995, 2016 - arxiv.org

In classical machine learning, regression is treated as a black box process of identifying a
suitable function from a hypothesis set without attempting to gain insight into the mechanism …

被引用次数：179 相关文章所有 6 个版本

[PDF] neurips.cc

From tempered to benign overfitting in relu neural networks

G Kornowski, G Yehudai… - Advances in Neural …, 2024 - proceedings.neurips.cc

Overparameterized neural networks (NNs) are observed to generalize well even when
trained to perfectly fit noisy data. This phenomenon motivated a large body of work on" …

被引用次数：11 相关文章所有 6 个版本

[PDF] neurips.cc

Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization

K Wen, Z Li, T Ma - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Despite extensive studies, the underlying reason as to why overparameterizedneural
networks can generalize remains elusive. Existing theory shows that common stochastic …

被引用次数：16 相关文章所有 6 个版本

高级搜索

QQ 群

Geometry of optimization and implicit regularization in deep learning

Bayesian interpolation with deep linear networks

Generalization in deep networks: The role of distance from initialization

Learning and generalization in overparameterized neural networks, going beyond two layers

Benign overfitting and grokking in relu networks for xor cluster data

A note on over-smoothing for graph neural networks

Implicit bias in leaky relu networks trained on high-dimensional data

Extrapolation and learning equations

From tempered to benign overfitting in relu neural networks

Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization

引用