T Zhou, Y Niu, H Lu, C Peng, Y Guo,
H Zhou - Information Fusion, 2024 - Elsevier
Abstract Vision Transformer (ViT) is widely used in the field of computer vision, in ViT, there
are four main steps, which are “four secrets”, such as patch division, token selection, position …