High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

Convex analysis of the mean field langevin dynamics

A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arXiv preprint arXiv …, 2022 - arxiv.org
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction

T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …

Sampling from the mean-field stationary distribution

Y Kook, MS Zhang, S Chewi… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or
equivalently, the complexity of minimizing a functional over the space of probability …

Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective

Y Lu - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an
important and challenging problem in machine learning. A canonical algorithm to finding the …

Estimating the rate-distortion function by Wasserstein gradient descent

Y Yang, S Eckstein, M Nutz… - Advances in Neural …, 2024 - proceedings.neurips.cc
In the theory of lossy compression, the rate-distortion (RD) function $ R (D) $ describes how
much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion) …