Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

Provable guarantees for nonlinear feature learning in three-layer neural networks

E Nichani, A Damian, JD Lee - Advances in Neural …, 2024 - proceedings.neurips.cc
One of the central questions in the theory of deep learning is to understand how neural
networks learn hierarchical features. The ability of deep networks to extract salient features …

Learning hierarchical polynomials with three-layer neural networks

Z Wang, E Nichani, JD Lee - arXiv preprint arXiv:2311.13774, 2023 - arxiv.org
We study the problem of learning hierarchical polynomials over the standard Gaussian
distribution with three-layer neural networks. We specifically consider target functions of the …

Blessing of depth in linear regression: Deeper models have flatter landscape around the true solution

J Ma, S Fattahi - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

How transformers implement induction heads: Approximation and optimization analysis

M Wang, R Yu, L Wu - arXiv preprint arXiv:2410.11474, 2024 - arxiv.org
Transformers have demonstrated exceptional in-context learning capabilities, yet the
theoretical understanding of the underlying mechanisms remain limited. A recent work …

How Many Neurons Does it Take to Approximate the Maximum?

I Safran, D Reichman, P Valiant - Proceedings of the 2024 Annual ACM-SIAM …, 2024 - SIAM
We study the size of a neural network needed to approximate the maximum function over d
inputs, in the most basic setting of approximating with respect to the L 2 norm, for continuous …

A functional-space mean-field theory of partially-trained three-layer neural networks

Z Chen, E Vanden-Eijnden, J Bruna - arXiv preprint arXiv:2210.16286, 2022 - arxiv.org
To understand the training dynamics of neural networks (NNs), prior studies have
considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical …

Blessing of nonconvexity in deep linear models: Depth flattens the optimization landscape around the true solution

J Ma, S Fattahi - arXiv preprint arXiv:2207.07612, 2022 - arxiv.org
This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

Towards antisymmetric neural ansatz separation

A Zweig, J Bruna - arXiv preprint arXiv:2208.03264, 2022 - arxiv.org
We study separations between two fundamental models (or\emph {Ans\" atze}) of
antisymmetric functions, that is, functions $ f $ of the form $ f (x_ {\sigma (1)},\ldots, x …