Y Li, T Luo, NK Yip - arXiv preprint arXiv:2007.03714, 2020 - arxiv.org
Gradient descent yields zero training loss in polynomial time for deep neural networks
despite non-convex nature of the objective function. The behavior of network in the infinite …