Mean field analysis of neural networks: A law of large numbers

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

被引用次数：296 相关文章所有 2 个版本

[PDF] arxiv.org

Propagation of chaos: a review of models, methods and applications. I. Models and methods

LP Chaintron, A Diez - arXiv preprint arXiv:2203.00446, 2022 - arxiv.org

The notion of propagation of chaos for large systems of interacting particles originates in
statistical physics and has recently become a central notion in many areas of applied …

被引用次数：95 相关文章所有 7 个版本

[PDF] cambridge.org

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：310 相关文章所有 12 个版本

[PDF] neurips.cc

The shaped transformer: Attention models in the infinite depth-and-width limit

L Noci, C Li, M Li, B He, T Hofmann… - Advances in …, 2024 - proceedings.neurips.cc

In deep learning theory, the covariance matrix of the representations serves as aproxy to
examine the network's trainability. Motivated by the success of Transform-ers, we study the …

被引用次数：21 相关文章所有 7 个版本

[HTML] nih.gov

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov

Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

被引用次数：825 相关文章所有 16 个版本

[PDF] neurips.cc

Gradient flow dynamics of shallow relu networks for square loss and orthogonal inputs

E Boursier, L Pillaud-Vivien… - Advances in Neural …, 2022 - proceedings.neurips.cc

The training of neural networks by gradient descent methods is a cornerstone of the deep
learning revolution. Yet, despite some recent progress, a complete theory explaining its …

被引用次数：55 相关文章所有 12 个版本

[PDF] arxiv.org

Propagation of chaos: a review of models, methods and applications. II. Applications

LP Chaintron, A Diez - arXiv preprint arXiv:2106.14812, 2021 - arxiv.org

The notion of propagation of chaos for large systems of interacting particles originates in
statistical physics and has recently become a central notion in many areas of applied …

被引用次数：109 相关文章所有 9 个版本

[HTML] nih.gov

[HTML][HTML] A selective overview of deep learning

J Fan, C Ma, Y Zhong - Statistical science: a review journal of the …, 2021 - ncbi.nlm.nih.gov

Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …

被引用次数：210 相关文章所有 14 个版本

[PDF] arxiv.org

Mean-field langevin dynamics: Exponential convergence and annealing

L Chizat - arXiv preprint arXiv:2202.01009, 2022 - arxiv.org

Noisy particle gradient descent (NPGD) is an algorithm to minimize convex functions over
the space of measures that include an entropy term. In the many-particle limit, this algorithm …

被引用次数：62 相关文章所有 6 个版本

[PDF] arxiv.org

Hierarchies, entropy, and quantitative propagation of chaos for mean field diffusions

D Lacker - Probability and Mathematical Physics, 2023 - msp.org

This paper develops a nonasymptotic, local approach to quantitative propagation of chaos
for a wide class of mean field diffusive dynamics. For a system of n interacting particles, the …

被引用次数：79 相关文章所有 4 个版本

高级搜索

QQ 群