We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
J Ba, MA Erdogdu, T Suzuki… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the learning of a single-index target function $ f_*:\mathbb {R}^ d\to\mathbb {R} $ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …
S Mei, A Montanari - Communications on Pure and Applied …, 2022 - Wiley Online Library
Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are …
Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of …
Teacher-student models provide a framework in which the typical-case performance of high- dimensional supervised learning can be described in closed form. The assumptions of …
I Akjouj, M Barbier, M Clenet… - … of the Royal …, 2024 - royalsocietypublishing.org
Ecosystems represent archetypal complex dynamical systems, often modelled by coupled differential equations of the form dxidt= xi ϕ i (x 1,…, x N), where N represents the number of …
H Hu, YM Lu - IEEE Transactions on Information Theory, 2022 - ieeexplore.ieee.org
We prove a universality theorem for learning with random features. Our result shows that, in terms of training and generalization errors, a random feature model with a nonlinear …
We study generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features …
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The …