Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arXiv preprint arXiv:1810.02054, 2018 - arxiv.org
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

Visualizing the loss landscape of neural nets

H Li, Z Xu, G Taylor, C Studer… - Advances in neural …, 2018 - proceedings.neurips.cc
Neural network training relies on our ability to find" good" minimizers of highly non-convex
loss functions. It is well known that certain network architecture designs (eg, skip …

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

Geometric deep learning: going beyond euclidean data

MM Bronstein, J Bruna, Y LeCun… - IEEE Signal …, 2017 - ieeexplore.ieee.org
Geometric deep learning is an umbrella term for emerging techniques attempting to
generalize (structured) deep neural models to non-Euclidean domains, such as graphs and …

[PDF][PDF] Adam optimization algorithm for wide and deep neural network.

IKM Jais, AR Ismail, SQ Nisa - Knowl. Eng. Data Sci., 2019 - pdfs.semanticscholar.org
“Memorization can be loosely defined as learning the frequent co-occurrence of items or
features and exploiting the correlation available in the historical data. Generalization, on the …

On the optimization of deep networks: Implicit acceleration by overparameterization

S Arora, N Cohen, E Hazan - International conference on …, 2018 - proceedings.mlr.press
Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that, sometimes …

Convergence analysis of two-layer neural networks with relu activation

Y Li, Y Yuan - Advances in neural information processing …, 2017 - proceedings.neurips.cc
In recent years, stochastic gradient descent (SGD) based techniques has become the
standard tools for training neural networks. However, formal theoretical understanding of …

Recovery guarantees for one-hidden-layer neural networks

K Zhong, Z Song, P Jain, PL Bartlett… - … on machine learning, 2017 - proceedings.mlr.press
In this paper, we consider regression problems with one-hidden-layer neural networks
(1NNs). We distill some properties of activation functions that lead to local strong convexity …

Spurious local minima are common in two-layer relu neural networks

I Safran, O Shamir - International conference on machine …, 2018 - proceedings.mlr.press
We consider the optimization problem associated with training simple ReLU neural networks
of the form $\mathbf {x}\mapsto\sum_ {i= 1}^{k}\max\{0,\mathbf {w} _i^\top\mathbf {x}\} $ with …