On the stepwise nature of self-supervised learning

JB Simon, M Knutins, L Ziyin, D Geisz… - International …, 2023 - proceedings.mlr.press
We present a simple picture of the training process of self-supervised learning methods with
dual deep networks. In our picture, these methods learn their high-dimensional embeddings …

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

Organizing memories for generalization in complementary learning systems

W Sun, M Advani, N Spruston, A Saxe… - Nature …, 2023 - nature.com
Memorization and generalization are complementary cognitive processes that jointly
promote adaptive behavior. For example, animals should memorize safe routes to specific …

On the asymptotic learning curves of kernel ridge regression under power-law decay

Y Li, Q Lin - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
The widely observed'benign overfitting phenomenon'in the neural network literature raises
the challenge to thebias-variance trade-off'doctrine in the statistical learning theory. Since …

[HTML][HTML] Deep networks for system identification: a survey

G Pillonetto, A Aravkin, D Gedon, L Ljung, AH Ribeiro… - Automatica, 2025 - Elsevier
Deep learning is a topic of considerable current interest. The availability of massive data
collections and powerful software resources has led to an impressive amount of results in …

From tempered to benign overfitting in relu neural networks

G Kornowski, G Yehudai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Overparameterized neural networks (NNs) are observed to generalize well even when
trained to perfectly fit noisy data. This phenomenon motivated a large body of work on" …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

Benign overfitting in deep neural networks under lazy training

Z Zhu, F Liu, G Chrysos, F Locatello… - … on Machine Learning, 2023 - proceedings.mlr.press
This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU
activation functions and proves that when the data distribution is well-separated, DNNs can …

Near-interpolators: Rapid norm growth and the trade-off between interpolation and generalization

Y Wang, R Sonthalia, W Hu - International Conference on …, 2024 - proceedings.mlr.press
We study the generalization capability of nearly-interpolating linear regressors: ${\beta} $'s
whose training error $\tau $ is positive but small, ie, below the noise floor. Under a random …

Generalization in kernel regression under realistic assumptions

D Barzilai, O Shamir - arXiv preprint arXiv:2312.15995, 2023 - arxiv.org
It is by now well-established that modern over-parameterized models seem to elude the bias-
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …