Taylorized training: Towards better approximation of neural network training at finite width

Y Bai, B Krause, H Wang, C Xiong, R Socher - arXiv preprint arXiv …, 2020 - arxiv.org
arXiv preprint arXiv:2002.04010, 2020arxiv.org
We propose\emph {Taylorized training} as an initiative towards better understanding neural
network training at finite width. Taylorized training involves training the $ k $-th order Taylor
expansion of the neural network at initialization, and is a principled extension of linearized
training---a recently proposed theory for understanding the success of deep learning. We
experiment with Taylorized training on modern neural network architectures, and show that
Taylorized training (1) agrees with full neural network training increasingly better as we …
We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the -th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase , and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of -th order Taylorized models decay exponentially over in wide neural networks.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果