Mean-field langevin dynamics: Exponential convergence and annealing

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

被引用次数：138 相关文章所有 9 个版本

[PDF] neurips.cc

Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

被引用次数：19 相关文章所有 8 个版本

[PDF] neurips.cc

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc

Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc

Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

被引用次数：12 相关文章所有 3 个版本

[PDF] mlr.press

Convex analysis of the mean field langevin dynamics

A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …

被引用次数：68 相关文章所有 4 个版本

[PDF] arxiv.org

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arXiv preprint arXiv …, 2022 - arxiv.org

We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

被引用次数：59 相关文章所有 9 个版本

[PDF] neurips.cc

Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction

T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc

The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …

被引用次数：6 相关文章

[PDF] mlr.press

Sampling from the mean-field stationary distribution

Y Kook, MS Zhang, S Chewi… - The Thirty Seventh …, 2024 - proceedings.mlr.press

We study the complexity of sampling from the stationary distribution of a mean-field SDE, or
equivalently, the complexity of minimizing a functional over the space of probability …

被引用次数：10 相关文章所有 5 个版本

[PDF] mlr.press

Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective

Y Lu - International Conference on Machine Learning, 2023 - proceedings.mlr.press

Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an
important and challenging problem in machine learning. A canonical algorithm to finding the …

被引用次数：23 相关文章所有 8 个版本

[PDF] neurips.cc

Estimating the rate-distortion function by Wasserstein gradient descent

Y Yang, S Eckstein, M Nutz… - Advances in Neural …, 2024 - proceedings.neurips.cc

In the theory of lossy compression, the rate-distortion (RD) function $ R (D) $ describes how
much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion) …

被引用次数：7 相关文章所有 10 个版本

高级搜索

QQ 群