More data can hurt for linear regression: Sample-wise double descent

T Viering, M Loog - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Learning curves provide insight into the dependence of a learner's generalization
performance on the training set size. This important tool can be used for model selection, to …

被引用次数：164 相关文章所有 10 个版本

[PDF] arxiv.org

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

被引用次数：1209 相关文章所有 11 个版本

[PDF] neurips.cc

What can transformers learn in-context? a case study of simple function classes

S Garg, D Tsipras, PS Liang… - Advances in Neural …, 2022 - proceedings.neurips.cc

In-context learning is the ability of a model to condition on a prompt sequence consisting of
in-context examples (input-output pairs corresponding to some task) along with a new query …

被引用次数：398 相关文章所有 7 个版本

[PDF] jmlr.org

Benign overfitting in ridge regression

A Tsigler, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org

In many modern applications of deep learning the neural network has many more
parameters than the data points used for its training. Motivated by those practices, a large …

被引用次数：239 相关文章所有 3 个版本

[PDF] openreview.net

Omnigrok: Grokking beyond algorithmic data

Z Liu, EJ Michaud, M Tegmark - The Eleventh International …, 2022 - openreview.net

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens
long after overfitting the training data, has remained elusive. We aim to understand grokking …

被引用次数：94 相关文章所有 3 个版本

[PDF] mlr.press

Rethinking bias-variance trade-off for generalization of neural networks

Z Yang, Y Yu, C You, J Steinhardt… - … on Machine Learning, 2020 - proceedings.mlr.press

The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …

被引用次数：234 相关文章所有 6 个版本

[PDF] jmlr.org

Classification vs regression in overparameterized regimes: Does the loss function matter?

V Muthukumar, A Narang, V Subramanian… - Journal of Machine …, 2021 - jmlr.org

We compare classification and regression tasks in an overparameterized linear model with
Gaussian features. On the one hand, we show that with sufficient overparameterization all …

被引用次数：180 相关文章所有 9 个版本

[PDF] nature.com

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

A Canatar, B Bordelon, C Pehlevan - Nature communications, 2021 - nature.com

A theoretical understanding of generalization remains an open problem for many machine
learning models, including deep networks where overparameterization leads to better …

被引用次数：196 相关文章所有 14 个版本

[PDF] arxiv.org

A model of double descent for high-dimensional binary linear classification

Z Deng, A Kammoun… - Information and Inference …, 2022 - academic.oup.com

We consider a model for logistic regression where only a subset of features of size is used
for training a linear classifier over training samples. The classifier is obtained by running …

被引用次数：180 相关文章所有 7 个版本

[PDF] mlr.press

Double trouble in double descent: Bias and variance (s) in the lazy regime

S d'Ascoli, M Refinetti, G Biroli… - … on Machine Learning, 2020 - proceedings.mlr.press

Deep neural networks can achieve remarkable generalization performances while
interpolating the training data. Rather than the U-curve emblematic of the bias-variance …

被引用次数：177 相关文章所有 6 个版本

高级搜索

QQ 群