Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, ie functions that depend on a 1-dimensional projection of the input …
Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural network in the mean-field regime is known to be capable of\textit {feature learning}, unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
A Nitanda, D Wu, T Suzuki - International Conference on …, 2022 - proceedings.mlr.press
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …
We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
T Suzuki, D Wu, A Nitanda - Advances in Neural …, 2024 - proceedings.neurips.cc
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability …
Y Lu - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an important and challenging problem in machine learning. A canonical algorithm to finding the …
In the theory of lossy compression, the rate-distortion (RD) function $ R (D) $ describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion) …