S Chen, C Ge, Z Tong, J Wang, Y Song, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A
following scenario is to adapt a ViT to various image and video recognition tasks. The …