Optimization-based separations for neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc

Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

被引用次数：89 相关文章所有 12 个版本

[PDF] mlr.press

Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

被引用次数：86 相关文章所有 4 个版本

[PDF] neurips.cc

Provable guarantees for nonlinear feature learning in three-layer neural networks

E Nichani, A Damian, JD Lee - Advances in Neural …, 2024 - proceedings.neurips.cc

One of the central questions in the theory of deep learning is to understand how neural
networks learn hierarchical features. The ability of deep networks to extract salient features …

被引用次数：16 相关文章所有 6 个版本

[PDF] arxiv.org

Learning hierarchical polynomials with three-layer neural networks

Z Wang, E Nichani, JD Lee - arXiv preprint arXiv:2311.13774, 2023 - arxiv.org

We study the problem of learning hierarchical polynomials over the standard Gaussian
distribution with three-layer neural networks. We specifically consider target functions of the …

被引用次数：10 相关文章所有 4 个版本

[PDF] neurips.cc

Blessing of depth in linear regression: Deeper models have flatter landscape around the true solution

J Ma, S Fattahi - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

How transformers implement induction heads: Approximation and optimization analysis

M Wang, R Yu, L Wu - arXiv preprint arXiv:2410.11474, 2024 - arxiv.org

Transformers have demonstrated exceptional in-context learning capabilities, yet the
theoretical understanding of the underlying mechanisms remain limited. A recent work …

被引用次数：2 相关文章所有 3 个版本

[PDF] siam.org

How Many Neurons Does it Take to Approximate the Maximum?

I Safran, D Reichman, P Valiant - Proceedings of the 2024 Annual ACM-SIAM …, 2024 - SIAM

We study the size of a neural network needed to approximate the maximum function over d
inputs, in the most basic setting of approximating with respect to the L 2 norm, for continuous …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

A functional-space mean-field theory of partially-trained three-layer neural networks

Z Chen, E Vanden-Eijnden, J Bruna - arXiv preprint arXiv:2210.16286, 2022 - arxiv.org

To understand the training dynamics of neural networks (NNs), prior studies have
considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Blessing of nonconvexity in deep linear models: Depth flattens the optimization landscape around the true solution

J Ma, S Fattahi - arXiv preprint arXiv:2207.07612, 2022 - arxiv.org

This work characterizes the effect of depth on the optimization landscape of linear
regression, showing that, despite their nonconvexity, deeper models have more desirable …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Towards antisymmetric neural ansatz separation

A Zweig, J Bruna - arXiv preprint arXiv:2208.03264, 2022 - arxiv.org

We study separations between two fundamental models (or\emph {Ans\" atze}) of
antisymmetric functions, that is, functions $ f $ of the form $ f (x_ {\sigma (1)},\ldots, x …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群