Y Bai, B Krause, H Wang, C Xiong, R Socher - arXiv e-prints, 2020 - ui.adsabs.harvard.edu
We propose\emph {Taylorized training} as an initiative towards better understanding neural
network training at finite width. Taylorized training involves training the $ k $-th order Taylor …