A dynamical model of neural scaling laws

B Bordelon, A Atanasov, C Pehlevan - arXiv preprint arXiv:2402.01092, 2024 - arxiv.org
On a variety of tasks, the performance of neural networks predictably improves with training
time, dataset size and model size across many orders of magnitude. This phenomenon is …

Hitting the high-dimensional notes: An ode for sgd learning dynamics on glms and multi-index models

E Collins-Woodfin, C Paquette… - … and Inference: A …, 2024 - academic.oup.com
We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-
dimensional limit when applied to generalized linear models and multi-index models (eg …

Depthwise hyperparameter transfer in residual networks: Dynamics and scaling limit

B Bordelon, L Noci, MB Li, B Hanin… - arXiv preprint arXiv …, 2023 - arxiv.org
The cost of hyperparameter tuning in deep learning has been rising with model sizes,
prompting practitioners to find new tuning methods using a proxy of smaller networks. One …

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

R Marino, F Ricci-Tersenghi - Machine Learning: Science and …, 2024 - iopscience.iop.org
The use of mini-batches of data in training artificial neural networks is nowadays very
common. Despite its broad usage, theories explaining quantitatively how large or small the …

High-dimensional learning of narrow neural networks

H Cui - arXiv preprint arXiv:2409.13904, 2024 - arxiv.org
Recent years have been marked with the fast-pace diversification and increasing ubiquity of
machine learning applications. Yet, a firm theoretical understanding of the surprising …

Stochastic gradient descent in high dimensions for multi-spiked tensor PCA

GB Arous, C Gerbelot, V Piccolo - arXiv preprint arXiv:2410.18162, 2024 - arxiv.org
We study the dynamics in high dimensions of online stochastic gradient descent for the multi-
spiked tensor model. This multi-index model arises from the tensor principal component …

How feature learning can improve neural scaling laws

B Bordelon, A Atanasov, C Pehlevan - arXiv preprint arXiv:2409.17858, 2024 - arxiv.org
We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical
analysis of this model shows how performance scales with model size, training time, and the …

Stochastic gradient flow dynamics of test risk and its exact solution for weak features

R Veiga, A Remizova, N Macris - arXiv preprint arXiv:2402.07626, 2024 - arxiv.org
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning
theory. Using a path integral formulation we provide, in the regime of a small learning rate, a …

High-dimensional optimization for multi-spiked tensor PCA

GB Arous, C Gerbelot, V Piccolo - arXiv preprint arXiv:2408.06401, 2024 - arxiv.org
We study the dynamics of two local optimization algorithms, online stochastic gradient
descent (SGD) and gradient flow, within the framework of the multi-spiked tensor model in …