Piecewise linear activations substantially shape the loss surfaces of neural networks

WK Liu, S Li, HS Park - Archives of Computational Methods in …, 2022 - Springer

This document presents comprehensive historical accounts on the developments of finite
element methods (FEM) since 1941, with a specific emphasis on developments related to …

被引用次数：194 相关文章所有 6 个版本

[PDF] arxiv.org

Recent advances in deep learning theory

F He, D Tao - arXiv preprint arXiv:2012.10931, 2020 - arxiv.org

Deep learning is usually described as an experiment-driven field under continuous criticizes
of lacking theoretical foundations. This problem has been partially fixed by a large volume of …

被引用次数：51 相关文章所有 3 个版本

[PDF] thecvf.com

Modeling image composition for complex scene generation

Z Yang, D Liu, C Wang, J Yang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We present a method that achieves state-of-the-art results on challenging (few-shot) layout-
to-image generation tasks by accurately modeling textures, structures and relationships …

被引用次数：51 相关文章所有 6 个版本

[PDF] neurips.cc

Exact solutions of a deep linear network

L Ziyin, B Li, X Meng - Advances in Neural Information …, 2022 - proceedings.neurips.cc

This work finds the analytical expression of the global minima of a deep linear network with
weight decay and stochastic neurons, a fundamental model for understanding the …

被引用次数：19 相关文章所有 6 个版本

[PDF] neurips.cc

Auto learning attention

B Ma, J Zhang, Y Xia, D Tao - Advances in neural …, 2020 - proceedings.neurips.cc

Attention modules have been demonstrated effective in strengthening the representation
ability of a neural network via reweighting spatial or channel features or stacking both …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding deep learning via decision boundary

S Lei, F He, Y Yuan, D Tao - IEEE Transactions on Neural …, 2023 - ieeexplore.ieee.org

This article discovers that the neural network (NN) with lower decision boundary (DB)
variability has better generalizability. Two new notions, algorithm DB variability and-data DB …

被引用次数：17 相关文章所有 6 个版本

Suboptimal local minima exist for wide neural networks with smooth activations

T Ding, D Li, R Sun - Mathematics of Operations Research, 2022 - pubsonline.informs.org

Does a large width eliminate all suboptimal local minima for neural nets? An affirmative
answer was given by a classic result published in 1995 for one-hidden-layer wide neural …

被引用次数：14 相关文章所有 5 个版本

Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks

B Liu, Z Liu, T Zhang, T Yuan - Neural Networks, 2021 - Elsevier

Whether sub-optimal local minima and saddle points exist in the highly non-convex loss
landscape of deep neural networks has a great impact on the performance of optimization …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural …

A Jentzen, A Riekert - arXiv preprint arXiv:2402.05155, 2024 - arxiv.org

Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD
method and the popular Adam optimizer are nowadays the method of choice in the training …

被引用次数：6 相关文章所有 3 个版本

Understanding the loss landscape of one-hidden-layer ReLU networks

B Liu - Knowledge-Based Systems, 2021 - Elsevier

In this paper, it is proved that for one-hidden-layer ReLU networks all differentiable local
minima are global inside each differentiable region. Necessary and sufficient conditions for …

被引用次数：14 相关文章

高级搜索

QQ 群