On the quality of the initial basin in overspecified neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press

Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

被引用次数：1420 相关文章所有 10 个版本

[PDF] openreview.net

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arXiv preprint arXiv:1810.02054, 2018 - arxiv.org

One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

被引用次数：851 相关文章所有 5 个版本

[PDF] neurips.cc

Visualizing the loss landscape of neural nets

H Li, Z Xu, G Taylor, C Studer… - Advances in neural …, 2018 - proceedings.neurips.cc

Neural network training relies on our ability to find" good" minimizers of highly non-convex
loss functions. It is well known that certain network architecture designs (eg, skip …

被引用次数：2282 相关文章所有 17 个版本

[PDF] arxiv.org

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org

We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

被引用次数：219 相关文章所有 8 个版本

[PDF] epfl.ch

Geometric deep learning: going beyond euclidean data

MM Bronstein, J Bruna, Y LeCun… - IEEE Signal …, 2017 - ieeexplore.ieee.org

Geometric deep learning is an umbrella term for emerging techniques attempting to
generalize (structured) deep neural models to non-Euclidean domains, such as graphs and …

被引用次数：4319 相关文章所有 14 个版本

[PDF] semanticscholar.org

[PDF][PDF] Adam optimization algorithm for wide and deep neural network.

IKM Jais, AR Ismail, SQ Nisa - Knowl. Eng. Data Sci., 2019 - pdfs.semanticscholar.org

“Memorization can be loosely defined as learning the frequent co-occurrence of items or
features and exploiting the correlation available in the historical data. Generalization, on the …

被引用次数：412 相关文章所有 5 个版本

[PDF] mlr.press

On the optimization of deep networks: Implicit acceleration by overparameterization

S Arora, N Cohen, E Hazan - International conference on …, 2018 - proceedings.mlr.press

Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that, sometimes …

被引用次数：579 相关文章所有 13 个版本

[PDF] neurips.cc

Convergence analysis of two-layer neural networks with relu activation

Y Li, Y Yuan - Advances in neural information processing …, 2017 - proceedings.neurips.cc

In recent years, stochastic gradient descent (SGD) based techniques has become the
standard tools for training neural networks. However, formal theoretical understanding of …

被引用次数：803 相关文章所有 6 个版本

[PDF] mlr.press

Recovery guarantees for one-hidden-layer neural networks

K Zhong, Z Song, P Jain, PL Bartlett… - … on machine learning, 2017 - proceedings.mlr.press

In this paper, we consider regression problems with one-hidden-layer neural networks
(1NNs). We distill some properties of activation functions that lead to local strong convexity …

被引用次数：370 相关文章所有 10 个版本

[PDF] mlr.press

Spurious local minima are common in two-layer relu neural networks

I Safran, O Shamir - International conference on machine …, 2018 - proceedings.mlr.press

We consider the optimization problem associated with training simple ReLU neural networks
of the form $\mathbf {x}\mapsto\sum_ {i= 1}^{k}\max\{0,\mathbf {w} _i^\top\mathbf {x}\} $ with …

被引用次数：312 相关文章所有 4 个版本

高级搜索

QQ 群