[HTML][HTML] Landscape and training regimes in deep learning

M Geiger, L Petrini, M Wyart - Physics Reports, 2021 - Elsevier
Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …

Convex analysis of the mean field langevin dynamics

A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arXiv preprint arXiv:2009.10713, 2020 - arxiv.org
The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction

T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …

Feature-learning networks are consistent across widths at realistic scales

N Vyas, A Atanasov, B Bordelon… - Advances in …, 2024 - proceedings.neurips.cc
We study the effect of width on the dynamics of feature-learning neural networks across a
variety of architectures and datasets. Early in training, wide neural networks trained on …

Dynamics in deep classifiers trained with the square loss: Normalization, low rank, neural collapse, and generalization bounds

M Xu, A Rangamani, Q Liao, T Galanti, T Poggio - Research, 2023 - spj.science.org
We overview several properties—old and new—of training overparameterized deep
networks under the square loss. We first consider a model of the dynamics of gradient flow …

Phase diagram for two-layer relu neural networks at infinite-width limit

T Luo, ZQJ Xu, Z Ma, Y Zhang - Journal of Machine Learning Research, 2021 - jmlr.org
How neural network behaves during the training over different choices of hyperparameters
is an important question in the study of neural networks. In this work, inspired by the phase …

Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics

T Suzuki, A Nitanda, D Wu - The Eleventh International Conference …, 2023 - openreview.net
The mean-field Langevin dynamics is characterized by a stochastic differential equation that
arises from (noisy) gradient descent on an infinite-width two-layer neural network, which can …

Embedding principle of loss landscape of deep neural networks

Y Zhang, Z Zhang, T Luo, ZJ Xu - Advances in Neural …, 2021 - proceedings.neurips.cc
Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously
important. In this work, we prove an embedding principle that the loss landscape of a DNN" …

Neural collapse in deep homogeneous classifiers and the role of weight decay

A Rangamani… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Neural Collapse is a phenomenon recently discovered in deep classifiers where the last
layer activations collapse onto their class means, while the means and last layer weights …