A dynamical central limit theorem for shallow neural networks

M Geiger, L Petrini, M Wyart - Physics Reports, 2021 - Elsevier

Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …

被引用次数：39 相关文章所有 6 个版本

[PDF] mlr.press

Convex analysis of the mean field langevin dynamics

A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …

被引用次数：54 相关文章所有 4 个版本

[PDF] arxiv.org

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arXiv preprint arXiv:2009.10713, 2020 - arxiv.org

The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

被引用次数：121 相关文章所有 3 个版本

[PDF] neurips.cc

Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction

T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc

The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …

被引用次数：5 相关文章

[PDF] neurips.cc

Feature-learning networks are consistent across widths at realistic scales

N Vyas, A Atanasov, B Bordelon… - Advances in …, 2024 - proceedings.neurips.cc

We study the effect of width on the dynamics of feature-learning neural networks across a
variety of architectures and datasets. Early in training, wide neural networks trained on …

被引用次数：14 相关文章所有 8 个版本

[PDF] science.org

Dynamics in deep classifiers trained with the square loss: Normalization, low rank, neural collapse, and generalization bounds

M Xu, A Rangamani, Q Liao, T Galanti, T Poggio - Research, 2023 - spj.science.org

We overview several properties—old and new—of training overparameterized deep
networks under the square loss. We first consider a model of the dynamics of gradient flow …

被引用次数：27 相关文章所有 8 个版本

[PDF] jmlr.org

Phase diagram for two-layer relu neural networks at infinite-width limit

T Luo, ZQJ Xu, Z Ma, Y Zhang - Journal of Machine Learning Research, 2021 - jmlr.org

How neural network behaves during the training over different choices of hyperparameters
is an important question in the study of neural networks. In this work, inspired by the phase …

被引用次数：62 相关文章所有 7 个版本

[PDF] openreview.net

Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics

T Suzuki, A Nitanda, D Wu - The Eleventh International Conference …, 2023 - openreview.net

The mean-field Langevin dynamics is characterized by a stochastic differential equation that
arises from (noisy) gradient descent on an infinite-width two-layer neural network, which can …

被引用次数：15 相关文章所有 2 个版本

[PDF] neurips.cc

Embedding principle of loss landscape of deep neural networks

Y Zhang, Z Zhang, T Luo, ZJ Xu - Advances in Neural …, 2021 - proceedings.neurips.cc

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously
important. In this work, we prove an embedding principle that the loss landscape of a DNN" …

被引用次数：33 相关文章所有 9 个版本

Neural collapse in deep homogeneous classifiers and the role of weight decay

A Rangamani… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

Neural Collapse is a phenomenon recently discovered in deep classifiers where the last
layer activations collapse onto their class means, while the means and last layer weights …

被引用次数：21 相关文章

高级搜索

QQ 群