Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization

J Jiang, W Huang, M Zhang, T Suzuki, L Nie - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …

Benign overfitting in single-head attention

R Magen, S Shang, Z Xu, S Frei, W Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy
training data but still achieves near-optimal test performance, has been extensively studied …

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

J Park, P Bloebaum, SP Kasiviswanathan - arXiv preprint arXiv …, 2024 - arxiv.org
We study the least-square regression problem with a two-layer fully-connected neural
network, with ReLU activation function, trained by gradient flow. Our first result is a …

Benign or not-benign overfitting in token selection of attention mechanism

K Sakamoto, I Sato - arXiv preprint arXiv:2409.17625, 2024 - arxiv.org
Modern over-parameterized neural networks can be trained to fit the training data perfectly
while still maintaining a high generalization performance. This" benign overfitting" …

[HTML][HTML] Assessment of Water Hydrochemical Parameters Using Machine Learning Tools

I Malashin, V Nelyub, A Borodulin, A Gantimurov… - Sustainability, 2025 - mdpi.com
Access to clean water is a fundamental human need, yet millions of people worldwide still
lack access to safe drinking water. Traditional water quality assessments, though reliable …

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality

S Wojtowytsch - Journal of Machine Learning Research, 2024 - jmlr.org
In this note, we study how neural networks with a single hidden layer and ReLU activation
interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin …