A Fan, T Lavril, E Grave, A Joulin… - arXiv e …, 2020 - ui.adsabs.harvard.edu
Transformers have been successfully applied to sequential, auto-regressive tasks despite
being feedforward networks. Unlike recurrent neural networks, Transformers use attention to …