Representations and generalization in artificial and brain neural networks

Q Li, B Sorscher, H Sompolinsky - Proceedings of the National Academy of …, 2024 - pnas.org
Humans and animals excel at generalizing from limited data, a capability yet to be fully
replicated in artificial intelligence. This perspective investigates generalization in biological …

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org
We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org
We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to sgd in two-layers networks

L Arnaboldi, L Stephan, F Krzakala… - The Thirty Sixth …, 2023 - proceedings.mlr.press
This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a
two-layer neural network trained on Gaussian data and labels generated by a similar …

On the different regimes of stochastic gradient descent

A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences
Modern deep networks are trained with stochastic gradient descent (SGD) whose key
hyperparameters are the number of data considered at each step or batch size B, and the …

Rigorous dynamical mean-field theory for stochastic gradient descent methods

C Gerbelot, E Troiani, F Mignacco, F Krzakala… - SIAM Journal on …, 2024 - SIAM
We prove closed-form equations for the exact high-dimensional asymptotics of a family of
first-order gradient-based methods, learning an estimator (eg, M-estimator, shallow neural …

Bottleneck structure in learned features: Low-dimension vs regularity tradeoff

A Jacot - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
Previous work has shown that DNNs withlarge depth $ L $ and $ L_ {2} $-regularization are
biased towards learninglow-dimensional representations of the inputs, which can be …

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B Simsek, A Bendjeddou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Any continuous function $ f^* $ can be approximated arbitrarily well by a neural
network with sufficiently many neurons $ k $. We consider the case when $ f^* $ itself is a …