Y Wang, X Chen, L Cao, W Huang, F Sun… - arXiv preprint arXiv …, 2022 - arxiv.org
Many adaptations of transformers have emerged to address the single-modal vision tasks,
where self-attention modules are stacked to handle input sources like images. Intuitively …