Deep limits of residual neural networks

M Thorpe, Y van Gennip - arXiv preprint arXiv:1810.11741, 2018 - arxiv.org
Neural networks have been very successful in many applications; we often, however, lack a
theoretical understanding of what the neural networks are actually learning. This problem …

Implicit regularization of deep residual networks towards neural ODEs

P Marion, YH Wu, ME Sander, G Biau - arXiv preprint arXiv:2309.01213, 2023 - arxiv.org
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …

Overparameterization of deep ResNet: zero loss and mean-field analysis

Z Ding, S Chen, Q Li, SJ Wright - Journal of machine learning research, 2022 - jmlr.org
Finding parameters in a deep neural network (NN) that fit training data is a nonconvex
optimization problem, but a basic first-order optimization method (gradient descent) finds a …

Generalization bounds for neural ordinary differential equations and deep residual networks

P Marion - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Neural ordinary differential equations (neural ODEs) are a popular family of continuous-
depth deep learning models. In this work, we consider a large family of parameterized ODEs …

Local minima in training of neural networks

G Swirszcz, WM Czarnecki, R Pascanu - arXiv preprint arXiv:1611.06310, 2016 - arxiv.org
There has been a lot of recent interest in trying to characterize the error surface of deep
models. This stems from a long standing question. Given that deep networks are highly …

On neural differential equations

P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great
interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Scaling properties of deep residual networks

AS Cohen, R Cont, A Rossier… - … Conference on Machine …, 2021 - proceedings.mlr.press
Residual networks (ResNets) have displayed impressive results in pattern recognition and,
recently, have garnered considerable theoretical interest due to a perceived link with neural …

Deep learning without poor local minima

K Kawaguchi - Advances in neural information processing …, 2016 - proceedings.neurips.cc
In this paper, we prove a conjecture published in 1989 and also partially address an open
problem announced at the Conference on Learning Theory (COLT) 2015. For an expected …

Collapse of deep and narrow neural nets

L Lu, Y Su, GE Karniadakis - arXiv preprint arXiv:1808.04947, 2018 - arxiv.org
Recent theoretical work has demonstrated that deep neural networks have superior
performance over shallow networks, but their training is more difficult, eg, they suffer from the …