From tempered to benign overfitting in relu neural networks

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] mlr.press

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

被引用次数：38 相关文章所有 5 个版本

[PDF] neurips.cc

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

被引用次数：22 相关文章所有 7 个版本

[PDF] arxiv.org

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org

Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization

J Jiang, W Huang, M Zhang, T Suzuki, L Nie - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Benign overfitting in single-head attention

R Magen, S Shang, Z Xu, S Frei, W Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy
training data but still achieves near-optimal test performance, has been extensively studied …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

J Park, P Bloebaum, SP Kasiviswanathan - arXiv preprint arXiv …, 2024 - arxiv.org

We study the least-square regression problem with a two-layer fully-connected neural
network, with ReLU activation function, trained by gradient flow. Our first result is a …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Benign or not-benign overfitting in token selection of attention mechanism

K Sakamoto, I Sato - arXiv preprint arXiv:2409.17625, 2024 - arxiv.org

Modern over-parameterized neural networks can be trained to fit the training data perfectly
while still maintaining a high generalization performance. This" benign overfitting" …

被引用次数：1 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] Assessment of Water Hydrochemical Parameters Using Machine Learning Tools

I Malashin, V Nelyub, A Borodulin, A Gantimurov… - Sustainability, 2025 - mdpi.com

Access to clean water is a fundamental human need, yet millions of people worldwide still
lack access to safe drinking water. Traditional water quality assessments, though reliable …

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality

S Wojtowytsch - Journal of Machine Learning Research, 2024 - jmlr.org

In this note, we study how neural networks with a single hidden layer and ReLU activation
interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin …

被引用次数：1 相关文章

高级搜索

QQ 群