Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

From tempered to benign overfitting in relu neural networks

G Kornowski, G Yehudai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Overparameterized neural networks (NNs) are observed to generalize well even when
trained to perfectly fit noisy data. This phenomenon motivated a large body of work on" …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net
In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs

D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc
We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …

Feature emergence via margin maximization: case studies in algebraic tasks

D Morwani, BL Edelman, CA Oncescu, R Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
Understanding the internal representations learned by neural networks is a cornerstone
challenge in the science of machine learning. While there have been significant recent …

Vanishing gradients in reinforcement finetuning of language models

N Razin, H Zhou, O Saremi, V Thilak, A Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Pretrained language models are commonly aligned with human preferences and
downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly …

Precise asymptotic generalization for multiclass classification with overparameterized linear models

D Wu, A Sahai - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
We study the asymptotic generalization of an overparameterized linear model for multiclass
classification under the Gaussian covariates bi-level model introduced in Subramanian et …

Noisy interpolation learning with shallow univariate relu networks

N Joshi, G Vardi, N Srebro - arXiv preprint arXiv:2307.15396, 2023 - arxiv.org
We study the asymptotic overfitting behavior of interpolation with minimum norm ($\ell_2 $ of
the weights) two-layer ReLU networks for noisy univariate regression. We show that …