Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org
We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org
We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents

Y Dandi, E Troiani, L Arnaboldi, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org
We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …

Pareto frontiers in deep feature learning: Data, compute, width, and luck

B Edelman, S Goel, S Kakade… - Advances in Neural …, 2024 - proceedings.neurips.cc
In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are
known to modulate nuanced resource tradeoffs. This work investigates how these …

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B Simsek, A Bendjeddou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Any continuous function $ f^* $ can be approximated arbitrarily well by a neural
network with sufficiently many neurons $ k $. We consider the case when $ f^* $ itself is a …

Agnostic active learning of single index models with linear sample complexity

A Gajjar, WM Tai, X Xingyu, C Hegde… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We study active learning methods for single index models of the form $ F ({\bm x})= f (⟨{\bm
w},{\bm x}⟩) $, where $ f:\mathbb {R}\to\mathbb {R} $ and ${\bx,\bm w}\in\mathbb {R}^ d $. In …

Hitting the high-dimensional notes: An ode for sgd learning dynamics on glms and multi-index models

E Collins-Woodfin, C Paquette… - … and Inference: A …, 2024 - academic.oup.com
We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-
dimensional limit when applied to generalized linear models and multi-index models (eg …

[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - … of machine learning …, 2024 - pmc.ncbi.nlm.nih.gov
An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …

Pareto frontiers in neural feature learning: Data, compute, width, and luck

BL Edelman, S Goel, S Kakade, E Malach… - arXiv preprint arXiv …, 2023 - arxiv.org
This work investigates the nuanced algorithm design choices for deep learning in the
presence of computational-statistical gaps. We begin by considering offline sparse parity …