We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $ x\mapsto W_N W_ {N-1}\cdots W_1 x $) by …
L Wu, Q Wang, C Ma - Advances in Neural Information …, 2019 - proceedings.neurips.cc
We analyze the global convergence of gradient descent for deep linear residual networks by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by …
G Gidel, F Bach… - Advances in Neural …, 2019 - proceedings.neurips.cc
When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization …
E Moroshko, BE Woodworth… - Advances in neural …, 2020 - proceedings.neurips.cc
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over" diagonal linear networks". This …
K Nar, S Sastry - arXiv preprint arXiv:1803.08203, 2018 - arxiv.org
While training error of most deep neural networks degrades as the depth of the network increases, residual networks appear to be an exception. We show that the main reason for …
K Lyu, J Li - arXiv preprint arXiv:1906.05890, 2019 - arxiv.org
In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural …
H Min, S Tarmoun, R Vidal… - … Conference on Machine …, 2021 - proceedings.mlr.press
Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly …
While deep learning is successful in a number of applications, it is not yet well understood theoretically. A theoretical characterization of deep learning should answer questions about …
Over the past few years, an extensively studied phenomenon in training deep networks is the implicit bias of gradient descent towards parsimonious solutions. In this work, we …