On the implicit bias in deep-learning algorithms

G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Transformers learn through gradual rank increase

E Abbe, S Bengio, E Boix-Adsera… - Advances in …, 2024 - proceedings.neurips.cc
We identify incremental learning dynamics in transformers, where the difference between
trained and initial weights progressively increases in rank. We rigorously prove this occurs …

Implicit bias of gradient descent for two-layer reLU and leaky reLU networks on nearly-orthogonal data

Y Kou, Z Chen, Q Gu - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

Random feature amplification: Feature learning and generalization in neural networks

S Frei, NS Chatterji, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org
In this work, we provide a characterization of the feature-learning process in two-layer ReLU
networks trained by gradient descent on the logistic loss following random initialization. We …

Neural Redshift: Random Networks are not Random Functions

D Teney, AM Nicolicioiu, V Hartmann… - Proceedings of the …, 2024 - openaccess.thecvf.com
Our understanding of the generalization capabilities of neural networks NNs is still
incomplete. Prevailing explanations are based on implicit biases of gradient descent GD but …