Optimization-based separations for neural networks

I Safran, J Lee - Conference on Learning Theory, 2022 - proceedings.mlr.press
Depth separation results propose a possible theoretical explanation for the benefits of deep
neural networks over shallower architectures, establishing that the former possess superior …

Width is less important than depth in relu neural networks

G Vardi, G Yehudai, O Shamir - Conference on learning …, 2022 - proceedings.mlr.press
We solve an open question from Lu et al.(2017), by showing that any target network with
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …

On the optimal memorization power of relu neural networks

G Vardi, G Yehudai, O Shamir - arXiv preprint arXiv:2110.03187, 2021 - arxiv.org
We study the memorization power of feedforward ReLU neural networks. We show that such
networks can memorize any $ N $ points that satisfy a mild separability assumption using …

Exponential separations in symmetric neural networks

A Zweig, J Bruna - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In this work we demonstrate a novel separation between symmetric neural network
architectures. Specifically, we consider the Relational Network~\parencite …

Depth Separation in Norm-Bounded Infinite-Width Neural Networks

S Parkinson, G Ongie, R Willett, O Shamir… - arXiv preprint arXiv …, 2024 - arxiv.org
We study depth separation in infinite-width neural networks, where complexity is controlled
by the overall squared $\ell_2 $-norm of the weights (sum of squares of all weights in the …

The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

L Gonon, R Graeber, A Jentzen - arXiv preprint arXiv:2301.08284, 2023 - arxiv.org
In this article we study high-dimensional approximation capacities of shallow and deep
artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular …

How Many Neurons Does it Take to Approximate the Maximum?

I Safran, D Reichman, P Valiant - Proceedings of the 2024 Annual ACM-SIAM …, 2024 - SIAM
We study the size of a neural network needed to approximate the maximum function over d
inputs, in the most basic setting of approximating with respect to the L 2 norm, for continuous …

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality

S Wojtowytsch - Journal of Machine Learning Research, 2024 - jmlr.org
In this note, we study how neural networks with a single hidden layer and ReLU activation
interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin …

Spectral complexity of deep neural networks

S Di Lillo, D Marinucci, M Salvi, S Vigogna - arXiv preprint arXiv …, 2024 - arxiv.org
It is well-known that randomly initialized, push-forward, fully-connected neural networks
weakly converge to isotropic Gaussian processes, in the limit where the width of all layers …

Rethink depth separation with intra-layer links

FL Fan, ZY Li, H Xiong, T Zeng - arXiv preprint arXiv:2305.07037, 2023 - arxiv.org
The depth separation theory is nowadays widely accepted as an effective explanation for the
power of depth, which consists of two parts: i) there exists a function representable by a …