Smoothing the landscape boosts the signal for sgd: Optimal sample complexity for learning...

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

被引用次数：19 相关文章所有 8 个版本

[PDF] arxiv.org

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org

We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

被引用次数：40 相关文章所有 3 个版本

[PDF] arxiv.org

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org

We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents

Y Dandi, E Troiani, L Arnaboldi, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org

We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …

被引用次数：20 相关文章所有 3 个版本

[PDF] neurips.cc

Pareto frontiers in deep feature learning: Data, compute, width, and luck

B Edelman, S Goel, S Kakade… - Advances in Neural …, 2024 - proceedings.neurips.cc

In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are
known to modulate nuanced resource tradeoffs. This work investigates how these …

被引用次数：3 相关文章所有 4 个版本

[PDF] neurips.cc

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

B Simsek, A Bendjeddou… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Any continuous function $ f^* $ can be approximated arbitrarily well by a neural
network with sufficiently many neurons $ k $. We consider the case when $ f^* $ itself is a …

被引用次数：6 相关文章所有 2 个版本

[PDF] mlr.press

Agnostic active learning of single index models with linear sample complexity

A Gajjar, WM Tai, X Xingyu, C Hegde… - The Thirty Seventh …, 2024 - proceedings.mlr.press

We study active learning methods for single index models of the form $ F ({\bm x})= f (⟨{\bm
w},{\bm x}⟩) $, where $ f:\mathbb {R}\to\mathbb {R} $ and ${\bx,\bm w}\in\mathbb {R}^ d $. In …

被引用次数：4 相关文章所有 2 个版本

[PDF] oup.com

Hitting the high-dimensional notes: An ode for sgd learning dynamics on glms and multi-index models

E Collins-Woodfin, C Paquette… - … and Inference: A …, 2024 - academic.oup.com

We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-
dimensional limit when applied to generalized linear models and multi-index models (eg …

被引用次数：13 相关文章所有 2 个版本

[HTML] nih.gov

[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - … of machine learning …, 2024 - pmc.ncbi.nlm.nih.gov

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Pareto frontiers in neural feature learning: Data, compute, width, and luck

BL Edelman, S Goel, S Kakade, E Malach… - arXiv preprint arXiv …, 2023 - arxiv.org

This work investigates the nuanced algorithm design choices for deep learning in the
presence of computational-statistical gaps. We begin by considering offline sparse parity …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群