Deep linear networks for regression are implicitly regularized towards flat minima

P Marion, L Chizat - arXiv preprint arXiv:2405.13456, 2024 - arxiv.org
The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to
understand their optimization dynamics. In this paper, we study the sharpness of deep linear …

A convergence analysis of gradient descent for deep linear neural networks

S Arora, N Cohen, N Golowich, W Hu - arXiv preprint arXiv:1810.02281, 2018 - arxiv.org
We analyze speed of convergence to global optimum for gradient descent training a deep
linear neural network (parameterized as $ x\mapsto W_N W_ {N-1}\cdots W_1 x $) by …

Global convergence of gradient descent for deep linear residual networks

L Wu, Q Wang, C Ma - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We analyze the global convergence of gradient descent for deep linear residual networks by
proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by …

Implicit regularization of discrete gradient dynamics in linear neural networks

G Gidel, F Bach… - Advances in Neural …, 2019 - proceedings.neurips.cc
When optimizing over-parameterized models, such as deep neural networks, a large set of
parameters can achieve zero training error. In such cases, the choice of the optimization …

Implicit bias in deep linear classification: Initialization scale vs training accuracy

E Moroshko, BE Woodworth… - Advances in neural …, 2020 - proceedings.neurips.cc
We provide a detailed asymptotic study of gradient flow trajectories and their implicit
optimization bias when minimizing the exponential loss over" diagonal linear networks". This …

Residual networks: Lyapunov stability and convex decomposition

K Nar, S Sastry - arXiv preprint arXiv:1803.08203, 2018 - arxiv.org
While training error of most deep neural networks degrades as the depth of the network
increases, residual networks appear to be an exception. We show that the main reason for …

Gradient descent maximizes the margin of homogeneous neural networks

K Lyu, J Li - arXiv preprint arXiv:1906.05890, 2019 - arxiv.org
In this paper, we study the implicit regularization of the gradient descent algorithm in
homogeneous neural networks, including fully-connected and convolutional neural …

On the explicit role of initialization on the convergence and implicit bias of overparametrized linear networks

H Min, S Tarmoun, R Vidal… - … Conference on Machine …, 2021 - proceedings.mlr.press
Neural networks trained via gradient descent with random initialization and without any
regularization enjoy good generalization performance in practice despite being highly …

Theoretical issues in deep networks

T Poggio, A Banburski, Q Liao - Proceedings of the …, 2020 - National Acad Sciences
While deep learning is successful in a number of applications, it is not yet well understood
theoretically. A theoretical characterization of deep learning should answer questions about …

The law of parsimony in gradient descent for learning deep linear networks

C Yaras, P Wang, W Hu, Z Zhu, L Balzano… - arXiv preprint arXiv …, 2023 - arxiv.org
Over the past few years, an extensively studied phenomenon in training deep networks is
the implicit bias of gradient descent towards parsimonious solutions. In this work, we …