High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

Q Li, B Sorscher, H Sompolinsky - Proceedings of the National Academy of …, 2024 - pnas.org

Humans and animals excel at generalizing from limited data, a capability yet to be fully
replicated in artificial intelligence. This perspective investigates generalization in biological …

被引用次数：10 相关文章所有 7 个版本

[PDF] neurips.cc

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc

Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

被引用次数：89 相关文章所有 12 个版本

[PDF] neurips.cc

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc

Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org

We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

被引用次数：40 相关文章所有 3 个版本

[PDF] arxiv.org

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org

We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

被引用次数：26 相关文章所有 3 个版本

[PDF] mlr.press

From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to sgd in two-layers networks

L Arnaboldi, L Stephan, F Krzakala… - The Thirty Sixth …, 2023 - proceedings.mlr.press

This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a
two-layer neural network trained on Gaussian data and labels generated by a similar …

被引用次数：28 相关文章所有 6 个版本

[PDF] pnas.org Full View

On the different regimes of stochastic gradient descent

A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences

Modern deep networks are trained with stochastic gradient descent (SGD) whose key
hyperparameters are the number of data considered at each step or batch size B, and the …

被引用次数：14 相关文章所有 6 个版本

[PDF] arxiv.org

Rigorous dynamical mean-field theory for stochastic gradient descent methods

C Gerbelot, E Troiani, F Mignacco, F Krzakala… - SIAM Journal on …, 2024 - SIAM

We prove closed-form equations for the exact high-dimensional asymptotics of a family of
first-order gradient-based methods, learning an estimator (eg, M-estimator, shallow neural …

被引用次数：34 相关文章所有 3 个版本

[PDF] neurips.cc

Bottleneck structure in learned features: Low-dimension vs regularity tradeoff

A Jacot - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

Previous work has shown that DNNs withlarge depth $ L $ and $ L_ {2} $-regularization are
biased towards learninglow-dimensional representations of the inputs, which can be …

被引用次数：13 相关文章所有 6 个版本

[PDF] neurips.cc

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B Simsek, A Bendjeddou… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Any continuous function $ f^* $ can be approximated arbitrarily well by a neural
network with sufficiently many neurons $ k $. We consider the case when $ f^* $ itself is a …

被引用次数：6 相关文章所有 2 个版本

高级搜索

QQ 群