A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …
The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network …
T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on …
We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow …
T Luo, ZQJ Xu, Z Ma, Y Zhang - Journal of Machine Learning Research, 2021 - jmlr.org
How neural network behaves during the training over different choices of hyperparameters is an important question in the study of neural networks. In this work, inspired by the phase …
T Suzuki, A Nitanda, D Wu - The Eleventh International Conference …, 2023 - openreview.net
The mean-field Langevin dynamics is characterized by a stochastic differential equation that arises from (noisy) gradient descent on an infinite-width two-layer neural network, which can …
Y Zhang, Z Zhang, T Luo, ZJ Xu - Advances in Neural …, 2021 - proceedings.neurips.cc
Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN" …
A Rangamani… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Neural Collapse is a phenomenon recently discovered in deep classifiers where the last layer activations collapse onto their class means, while the means and last layer weights …