Generalization error of random feature and kernel methods: hypercontractivity and kernel...

A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org

The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

被引用次数：88 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] cambridge.org [ 下载加速 ]

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：379 相关文章所有 12 个版本网页快照

[PDF] sci-hub [PDF] neurips.cc [ 下载加速 ]

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

被引用次数：140 相关文章所有 9 个版本网页快照

[PDF] sci-hub [PDF] neurips.cc [ 下载加速 ]

Learning in the presence of low-dimensional structure: a spiked random matrix perspective

J Ba, MA Erdogdu, T Suzuki… - Advances in Neural …, 2024 - proceedings.neurips.cc

We consider the learning of a single-index target function $ f_*:\mathbb {R}^ d\to\mathbb {R}
$ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …

被引用次数：28 相关文章所有 5 个版本网页快照

[PDF] sci-hub [PDF] neurips.cc [ 下载加速 ]

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

被引用次数：25 相关文章所有 6 个版本网页快照

[PDF] sci-hub [PDF] mlr.press [ 下载加速 ]

The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks

E Abbe, EB Adsera… - Conference on Learning …, 2022 - proceedings.mlr.press

It is currently known how to characterize functions that neural networks can learn with SGD
for two extremal parametrizations: neural networks in the linear regime, and neural networks …

被引用次数：125 相关文章所有 5 个版本网页快照

[PDF] sci-hub [PDF] neurips.cc [ 下载加速 ]

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc

Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

被引用次数：89 相关文章所有 12 个版本网页快照

[PDF] sci-hub [PDF] jmlr.org [ 下载加速 ]

Benign overfitting in ridge regression

A Tsigler, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org

In many modern applications of deep learning the neural network has many more
parameters than the data points used for its training. Motivated by those practices, a large …

被引用次数：249 相关文章所有 3 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Random features for kernel approximation: A survey on algorithms, theory, and beyond

F Liu, X Huang, Y Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …

被引用次数：213 相关文章所有 9 个版本网页快照

[PDF] sci-hub [PDF] mlr.press [ 下载加速 ]

Deterministic equivalent and error universality of deep random features learning

D Schröder, H Cui, D Dmitriev… - … on Machine Learning, 2023 - proceedings.mlr.press

This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …

被引用次数：32 相关文章所有 11 个版本网页快照

高级搜索

QQ 群