F He, D Tao - arXiv preprint arXiv:2012.10931, 2020 - arxiv.org
Deep learning is usually described as an experiment-driven field under continuous criticizes of lacking theoretical foundations. This problem has been partially fixed by a large volume of …
We present a method that achieves state-of-the-art results on challenging (few-shot) layout- to-image generation tasks by accurately modeling textures, structures and relationships …
L Ziyin, B Li, X Meng - Advances in Neural Information …, 2022 - proceedings.neurips.cc
This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the …
B Ma, J Zhang, Y Xia, D Tao - Advances in neural …, 2020 - proceedings.neurips.cc
Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both …
S Lei, F He, Y Yuan, D Tao - IEEE Transactions on Neural …, 2023 - ieeexplore.ieee.org
This article discovers that the neural network (NN) with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and-data DB …
T Ding, D Li, R Sun - Mathematics of Operations Research, 2022 - pubsonline.informs.org
Does a large width eliminate all suboptimal local minima for neural nets? An affirmative answer was given by a classic result published in 1995 for one-hidden-layer wide neural …
B Liu, Z Liu, T Zhang, T Yuan - Neural Networks, 2021 - Elsevier
Whether sub-optimal local minima and saddle points exist in the highly non-convex loss landscape of deep neural networks has a great impact on the performance of optimization …
A Jentzen, A Riekert - arXiv preprint arXiv:2402.05155, 2024 - arxiv.org
Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training …
In this paper, it is proved that for one-hidden-layer ReLU networks all differentiable local minima are global inside each differentiable region. Necessary and sufficient conditions for …