[HTML][HTML] High-dimensional dynamics of generalization error in neural networks

MS Advani, AM Saxe, H Sompolinsky - Neural Networks, 2020 - Elsevier
We perform an analysis of the average generalization dynamics of large neural networks
trained using gradient descent. We study the practically-relevant “high-dimensional” regime …

How initial conditions affect generalization performance in large networks

A Atiya, C Ji - IEEE transactions on neural networks, 1997 - ieeexplore.ieee.org
Generalization is one of the most important problems in neural-network research. It is
influenced by several factors in the network design, such as network size, weight decay …

Generalization error of generalized linear models in high dimensions

M Emami, M Sahraee-Ardakan… - International …, 2020 - proceedings.mlr.press
At the heart of machine learning lies the question of generalizability of learned rules over
previously unseen data. While over-parameterized models based on neural networks are …

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

S Goldt, M Advani, AM Saxe… - Advances in neural …, 2019 - proceedings.neurips.cc
Deep neural networks achieve stellar generalisation even when they have enough
parameters to easily fit all their training data. We study this phenomenon by analysing the …

Wide neural networks of any depth evolve as linear models under gradient descent

J Lee, L Xiao, S Schoenholz, Y Bahri… - Advances in neural …, 2019 - proceedings.neurips.cc
A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …

Generalization in deep networks: The role of distance from initialization

V Nagarajan, JZ Kolter - arXiv preprint arXiv:1901.01672, 2019 - arxiv.org
Why does training deep neural networks using stochastic gradient descent (SGD) result in a
generalization error that does not worsen with the number of parameters in the network? To …

Rethinking bias-variance trade-off for generalization of neural networks

Z Yang, Y Yu, C You, J Steinhardt… - … on Machine Learning, 2020 - proceedings.mlr.press
The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …

Generalization of two-layer neural networks: An asymptotic viewpoint

J Ba, M Erdogdu, T Suzuki, D Wu… - … conference on learning …, 2020 - openreview.net
This paper investigates the generalization properties of two-layer neural networks in high-
dimensions, ie when the number of samples $ n $, features $ d $, and neurons $ h $ tend to …

What can linearized neural networks actually say about generalization?

G Ortiz-Jiménez… - Advances in Neural …, 2021 - proceedings.neurips.cc
For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully
characterizes generalization, but for the networks used in practice, the empirical NTK only …

[HTML][HTML] Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

A Canatar, B Bordelon, C Pehlevan - Nature communications, 2021 - nature.com
A theoretical understanding of generalization remains an open problem for many machine
learning models, including deep networks where overparameterization leads to better …