Explicit sparse transformer: Concentrated attention through explicit selection

G Zhao, J Lin, Z Zhang, X Ren, Q Su, X Sun - arXiv preprint arXiv …, 2019 - arxiv.org
Self-attention based Transformer has demonstrated the state-of-the-art performances in a
number of natural language processing tasks. Self-attention is able to model long-term …

Sparse transformer: Concentrated attention through explicit selection

G Zhao, J Lin, Z Zhang, X Ren, X Sun - 2019 - openreview.net
Self-attention-based Transformer has demonstrated the state-of-the-art performances in a
number of natural language processing tasks. Self attention is able to model long-term …