Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …
Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a …
P Marion - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Neural ordinary differential equations (neural ODEs) are a popular family of continuous- depth deep learning models. In this work, we consider a large family of parameterized ODEs …
There has been a lot of recent interest in trying to characterize the error surface of deep models. This stems from a long standing question. Given that deep networks are highly …
P Kidger - arXiv preprint arXiv:2202.02435, 2022 - arxiv.org
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks …
S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves …
Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural …
K Kawaguchi - Advances in neural information processing …, 2016 - proceedings.neurips.cc
In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. For an expected …
L Lu, Y Su, GE Karniadakis - arXiv preprint arXiv:1808.04947, 2018 - arxiv.org
Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, eg, they suffer from the …