Overview frequency principle/spectral bias in deep learning

ZQJ Xu, Y Zhang, T Luo - Communications on Applied Mathematics and …, 2024 - Springer
Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

M Wang, C Ma - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …

Empirical phase diagram for three-layer neural networks with infinite width

H Zhou, Z Qixuan, Z Jin, T Luo… - Advances in Neural …, 2022 - proceedings.neurips.cc
Substantial work indicates that the dynamics of neural networks (NNs) is closely related to
their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with …

Embedding principle of loss landscape of deep neural networks

Y Zhang, Z Zhang, T Luo, ZJ Xu - Advances in Neural …, 2021 - proceedings.neurips.cc
Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously
important. In this work, we prove an embedding principle that the loss landscape of a DNN" …

Mathematical introduction to deep learning: methods, implementations, and theory

A Jentzen, B Kuckuck, P von Wurstemberger - arXiv preprint arXiv …, 2023 - arxiv.org
This book aims to provide an introduction to the topic of deep learning algorithms. We review
essential components of deep learning algorithms in full mathematical detail including …

Implicit regularization of dropout

Z Zhang, ZQJ Xu - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
It is important to understand how dropout, a popular regularization method, aids in achieving
a good generalization solution during neural network training. In this work, we present a …

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

A Jentzen, A Riekert - arXiv preprint arXiv:2112.09684, 2021 - arxiv.org
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily
large number of hidden layers and we prove convergence of the risk of the GD optimization …

Towards understanding the condensation of neural networks at initial training

H Zhou, Z Qixuan, T Luo, Y Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Empirical works show that for ReLU neural networks (NNs) with small initialization, input
weights of hidden neurons (the input weight of a hidden neuron consists of the weight from …

Loss spike in training neural networks

X Li, ZQJ Xu, Z Zhang - arXiv preprint arXiv:2305.12133, 2023 - arxiv.org
In this work, we investigate the mechanism underlying loss spikes observed during neural
network training. When the training enters a region with a lower-loss-as-sharper (LLAS) …

Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non …

S Ibragimov, A Jentzen, A Riekert - arXiv preprint arXiv:2212.13111, 2022 - arxiv.org
Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong
nowadays to the most heavily employed computational schemes in the digital world. Despite …