Spurious local minima are common in two-layer relu neural networks

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org

Substantial progress has been made recently on developing provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

被引用次数：519 相关文章所有 13 个版本

[PDF] ncsu.edu

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

被引用次数：174 相关文章所有 7 个版本

[PDF] mlr.press

Understanding self-supervised learning dynamics without contrastive pairs

Y Tian, X Chen, S Ganguli - International Conference on …, 2021 - proceedings.mlr.press

While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …

被引用次数：335 相关文章所有 7 个版本

[PDF] mlr.press

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press

Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

被引用次数：1115 相关文章所有 6 个版本

[PDF] mlr.press

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press

Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

被引用次数：1426 相关文章所有 10 个版本

[PDF] openreview.net

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arXiv preprint arXiv:1810.02054, 2018 - arxiv.org

One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

被引用次数：854 相关文章所有 5 个版本

[PDF] springer.com

Gradient descent optimizes over-parameterized deep ReLU networks

D Zou, Y Cao, D Zhou, Q Gu - Machine learning, 2020 - Springer

We study the problem of training deep fully connected neural networks with Rectified Linear
Unit (ReLU) activation function and cross entropy loss function for binary classification using …

被引用次数：766 相关文章所有 8 个版本

[PDF] arxiv.org

Dying relu and initialization: Theory and numerical examples

L Lu, Y Shin, Y Su, GE Karniadakis - arXiv preprint arXiv:1903.06733, 2019 - arxiv.org

The dying ReLU refers to the problem when ReLU neurons become inactive and only output
0 for any input. There are many empirical and heuristic explanations of why ReLU neurons …

被引用次数：682 相关文章所有 9 个版本

[PDF] neurips.cc

A geometric analysis of neural collapse with unconstrained features

Z Zhu, T Ding, J Zhou, X Li, C You… - Advances in Neural …, 2021 - proceedings.neurips.cc

We provide the first global optimization landscape analysis of Neural Collapse--an intriguing
empirical phenomenon that arises in the last-layer classifiers and features of neural …

被引用次数：201 相关文章所有 10 个版本

[PDF] arxiv.org

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org

We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

被引用次数：220 相关文章所有 8 个版本

高级搜索

QQ 群