Addressing some limitations of transformers with feedback memory

A Fan, T Lavril, E Grave, A Joulin… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformers have been successfully applied to sequential, auto-regressive tasks despite
being feedforward networks. Unlike recurrent neural networks, Transformers use attention to …

Addressing Some Limitations of Transformers with Feedback Memory

A Fan, T Lavril, E Grave, A Joulin… - arXiv e …, 2020 - ui.adsabs.harvard.edu
Transformers have been successfully applied to sequential, auto-regressive tasks despite
being feedforward networks. Unlike recurrent neural networks, Transformers use attention to …

Addressing Some Limitations of Transformers with Feedback Memory

A Fan, T Lavril, E Grave, A Joulin, S Sukhbaatar - openreview.net
Transformers have been successfully applied to sequential tasks despite being feedforward
networks. Unlike recurrent neural networks, Transformers use attention to capture temporal …