Rigorous dynamical mean-field theory for stochastic gradient descent methods

B Bordelon, A Atanasov, C Pehlevan - arXiv preprint arXiv:2402.01092, 2024 - arxiv.org

On a variety of tasks, the performance of neural networks predictably improves with training
time, dataset size and model size across many orders of magnitude. This phenomenon is …

被引用次数：28 相关文章所有 3 个版本

[PDF] oup.com

Hitting the high-dimensional notes: An ode for sgd learning dynamics on glms and multi-index models

E Collins-Woodfin, C Paquette… - … and Inference: A …, 2024 - academic.oup.com

We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-
dimensional limit when applied to generalized linear models and multi-index models (eg …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Depthwise hyperparameter transfer in residual networks: Dynamics and scaling limit

B Bordelon, L Noci, MB Li, B Hanin… - arXiv preprint arXiv …, 2023 - arxiv.org

The cost of hyperparameter tuning in deep learning has been rising with model sizes,
prompting practitioners to find new tuning methods using a proxy of smaller networks. One …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc

Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

被引用次数：5 相关文章所有 5 个版本

[PDF] iop.org Full View

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

R Marino, F Ricci-Tersenghi - Machine Learning: Science and …, 2024 - iopscience.iop.org

The use of mini-batches of data in training artificial neural networks is nowadays very
common. Despite its broad usage, theories explaining quantitatively how large or small the …

被引用次数：10 相关文章所有 8 个版本

[PDF] arxiv.org

High-dimensional learning of narrow neural networks

H Cui - arXiv preprint arXiv:2409.13904, 2024 - arxiv.org

Recent years have been marked with the fast-pace diversification and increasing ubiquity of
machine learning applications. Yet, a firm theoretical understanding of the surprising …

Stochastic gradient descent in high dimensions for multi-spiked tensor PCA

GB Arous, C Gerbelot, V Piccolo - arXiv preprint arXiv:2410.18162, 2024 - arxiv.org

We study the dynamics in high dimensions of online stochastic gradient descent for the multi-
spiked tensor model. This multi-index model arises from the tensor principal component …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

How feature learning can improve neural scaling laws

B Bordelon, A Atanasov, C Pehlevan - arXiv preprint arXiv:2409.17858, 2024 - arxiv.org

We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical
analysis of this model shows how performance scales with model size, training time, and the …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Stochastic gradient flow dynamics of test risk and its exact solution for weak features

R Veiga, A Remizova, N Macris - arXiv preprint arXiv:2402.07626, 2024 - arxiv.org

We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning
theory. Using a path integral formulation we provide, in the regime of a small learning rate, a …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

High-dimensional optimization for multi-spiked tensor PCA

GB Arous, C Gerbelot, V Piccolo - arXiv preprint arXiv:2408.06401, 2024 - arxiv.org

We study the dynamics of two local optimization algorithms, online stochastic gradient
descent (SGD) and gradient flow, within the framework of the multi-spiked tensor model in …

被引用次数：1 相关文章所有 5 个版本

高级搜索

QQ 群