X Ma, C Zhou, X Kong, J He, L Gui, G Neubig… - arXiv preprint arXiv …, 2022 - arxiv.org
The design choices in the Transformer attention mechanism, including weak inductive bias
and quadratic computational complexity, have limited its application for modeling long …