Embedding principle: a hierarchical structure of loss landscape of deep neural networks

ZQJ Xu, Y Zhang, T Luo - Communications on Applied Mathematics and …, 2024 - Springer

Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …

被引用次数：67 相关文章所有 3 个版本

[PDF] neurips.cc

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

M Wang, C Ma - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …

被引用次数：14 相关文章所有 7 个版本

[PDF] neurips.cc

Empirical phase diagram for three-layer neural networks with infinite width

H Zhou, Z Qixuan, Z Jin, T Luo… - Advances in Neural …, 2022 - proceedings.neurips.cc

Substantial work indicates that the dynamics of neural networks (NNs) is closely related to
their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with …

被引用次数：21 相关文章所有 6 个版本

[PDF] neurips.cc

Embedding principle of loss landscape of deep neural networks

Y Zhang, Z Zhang, T Luo, ZJ Xu - Advances in Neural …, 2021 - proceedings.neurips.cc

Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously
important. In this work, we prove an embedding principle that the loss landscape of a DNN" …

被引用次数：38 相关文章所有 9 个版本

[PDF] arxiv.org

Mathematical introduction to deep learning: methods, implementations, and theory

A Jentzen, B Kuckuck, P von Wurstemberger - arXiv preprint arXiv …, 2023 - arxiv.org

This book aims to provide an introduction to the topic of deep learning algorithms. We review
essential components of deep learning algorithms in full mathematical detail including …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Implicit regularization of dropout

Z Zhang, ZQJ Xu - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

It is important to understand how dropout, a popular regularization method, aids in achieving
a good generalization solution during neural network training. In this work, we present a …

被引用次数：21 相关文章所有 11 个版本

[PDF] arxiv.org

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

A Jentzen, A Riekert - arXiv preprint arXiv:2112.09684, 2021 - arxiv.org

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily
large number of hidden layers and we prove convergence of the risk of the GD optimization …

被引用次数：24 相关文章所有 3 个版本

[PDF] neurips.cc

Towards understanding the condensation of neural networks at initial training

H Zhou, Z Qixuan, T Luo, Y Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Empirical works show that for ReLU neural networks (NNs) with small initialization, input
weights of hidden neurons (the input weight of a hidden neuron consists of the weight from …

被引用次数：30 相关文章所有 7 个版本

[PDF] arxiv.org

Loss spike in training neural networks

X Li, ZQJ Xu, Z Zhang - arXiv preprint arXiv:2305.12133, 2023 - arxiv.org

In this work, we investigate the mechanism underlying loss spikes observed during neural
network training. When the training enters a region with a lower-loss-as-sharper (LLAS) …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non …

S Ibragimov, A Jentzen, A Riekert - arXiv preprint arXiv:2212.13111, 2022 - arxiv.org

Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong
nowadays to the most heavily employed computational schemes in the digital world. Despite …

被引用次数：13 相关文章所有 3 个版本

高级搜索

QQ 群