Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on developing provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Understanding self-supervised learning dynamics without contrastive pairs

Y Tian, X Chen, S Ganguli - International Conference on …, 2021 - proceedings.mlr.press
While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arXiv preprint arXiv:1810.02054, 2018 - arxiv.org
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

Gradient descent optimizes over-parameterized deep ReLU networks

D Zou, Y Cao, D Zhou, Q Gu - Machine learning, 2020 - Springer
We study the problem of training deep fully connected neural networks with Rectified Linear
Unit (ReLU) activation function and cross entropy loss function for binary classification using …

Dying relu and initialization: Theory and numerical examples

L Lu, Y Shin, Y Su, GE Karniadakis - arXiv preprint arXiv:1903.06733, 2019 - arxiv.org
The dying ReLU refers to the problem when ReLU neurons become inactive and only output
0 for any input. There are many empirical and heuristic explanations of why ReLU neurons …

A geometric analysis of neural collapse with unconstrained features

Z Zhu, T Ding, J Zhou, X Li, C You… - Advances in Neural …, 2021 - proceedings.neurips.cc
We provide the first global optimization landscape analysis of Neural Collapse--an intriguing
empirical phenomenon that arises in the last-layer classifiers and features of neural …

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …