E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …
One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features …
We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the …
J Ma, S Fattahi - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable …
M Wang, R Yu, L Wu - arXiv preprint arXiv:2410.11474, 2024 - arxiv.org
Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remain limited. A recent work …
We study the size of a neural network needed to approximate the maximum function over d inputs, in the most basic setting of approximating with respect to the L 2 norm, for continuous …
To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical …
This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable …
We study separations between two fundamental models (or\emph {Ans\" atze}) of antisymmetric functions, that is, functions $ f $ of the form $ f (x_ {\sigma (1)},\ldots, x …