Hidden progress in deep learning: Sgd learns parities near the computational limit

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

A theoretical analysis on feature learning in neural networks: Emergence from inputs and advantage over fixed features

Z Shi, J Wei, Y Liang - arXiv preprint arXiv:2206.01717, 2022 - arxiv.org
An important characteristic of neural networks is their ability to learn representations of the
input data with effective features for prediction, which is believed to be a key factor to their …

Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals

I Diakonikolas, D Kane, L Ren - International Conference on …, 2023 - proceedings.mlr.press
We study the task of agnostically learning halfspaces under the Gaussian distribution.
Specifically, given labeled examples $(\\mathbf {x}, y) $ from an unknown distribution on …

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org
We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

Random feature amplification: Feature learning and generalization in neural networks

S Frei, NS Chatterji, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org
In this work, we provide a characterization of the feature-learning process in two-layer ReLU
networks trained by gradient descent on the logistic loss following random initialization. We …

Statistical-query lower bounds via functional gradients

S Goel, A Gollakota, A Klivans - Advances in Neural …, 2020 - proceedings.neurips.cc
We give the first statistical-query lower bounds for agnostically learning any non-polynomial
activation with respect to Gaussian marginals (eg, ReLU, sigmoid, sign). For the specific …

Early-stopped neural networks are consistent

Z Ji, J Li, M Telgarsky - Advances in Neural Information …, 2021 - proceedings.neurips.cc
This work studies the behavior of shallow ReLU networks trained with the logistic loss via
gradient descent on binary classification data where the underlying data distribution is …

Proxy convexity: A unified framework for the analysis of neural networks trained by gradient descent

S Frei, Q Gu - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Although the optimization objectives for learning neural networks are highly non-convex,
gradient-based methods have been wildly successful at learning neural networks in …

Agnostic active learning of single index models with linear sample complexity

A Gajjar, WM Tai, X Xingyu, C Hegde… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We study active learning methods for single index models of the form $ F ({\bm x})= f (⟨{\bm
w},{\bm x}⟩) $, where $ f:\mathbb {R}\to\mathbb {R} $ and ${\bx,\bm w}\in\mathbb {R}^ d $. In …