The shape of learning curves: a review

T Viering, M Loog - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Learning curves provide insight into the dependence of a learner's generalization
performance on the training set size. This important tool can be used for model selection, to …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

What can transformers learn in-context? a case study of simple function classes

S Garg, D Tsipras, PS Liang… - Advances in Neural …, 2022 - proceedings.neurips.cc
In-context learning is the ability of a model to condition on a prompt sequence consisting of
in-context examples (input-output pairs corresponding to some task) along with a new query …

Benign overfitting in ridge regression

A Tsigler, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org
In many modern applications of deep learning the neural network has many more
parameters than the data points used for its training. Motivated by those practices, a large …

Omnigrok: Grokking beyond algorithmic data

Z Liu, EJ Michaud, M Tegmark - The Eleventh International …, 2022 - openreview.net
Grokking, the unusual phenomenon for algorithmic datasets where generalization happens
long after overfitting the training data, has remained elusive. We aim to understand grokking …

Rethinking bias-variance trade-off for generalization of neural networks

Z Yang, Y Yu, C You, J Steinhardt… - … on Machine Learning, 2020 - proceedings.mlr.press
The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …

Classification vs regression in overparameterized regimes: Does the loss function matter?

V Muthukumar, A Narang, V Subramanian… - Journal of Machine …, 2021 - jmlr.org
We compare classification and regression tasks in an overparameterized linear model with
Gaussian features. On the one hand, we show that with sufficient overparameterization all …

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

A Canatar, B Bordelon, C Pehlevan - Nature communications, 2021 - nature.com
A theoretical understanding of generalization remains an open problem for many machine
learning models, including deep networks where overparameterization leads to better …

A model of double descent for high-dimensional binary linear classification

Z Deng, A Kammoun… - Information and Inference …, 2022 - academic.oup.com
We consider a model for logistic regression where only a subset of features of size is used
for training a linear classifier over training samples. The classifier is obtained by running …

Double trouble in double descent: Bias and variance (s) in the lazy regime

S d'Ascoli, M Refinetti, G Biroli… - … on Machine Learning, 2020 - proceedings.mlr.press
Deep neural networks can achieve remarkable generalization performances while
interpolating the training data. Rather than the U-curve emblematic of the bias-variance …