查看文章

arxiv.org 中的 [PDF]

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

作者

Qiming Zhang*, Yufei Xu*, Jing Zhang, Dacheng Tao

发表日期

2022/2/21

期刊

IJCV2022

简介

Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism. Nevertheless, they treat an image as a 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance, which is instead learned implicitly from large-scale training data with longer training schedules. In this paper, we leverage the two IBs and propose the ViTAE transformer, which utilizes a reduction cell for multi-scale feature and a normal cell for locality. The two kinds of cells are stacked in both isotropic and multi-stage manners to formulate two families of ViTAE models, i.e., the vanilla ViTAE and ViTAEv2. Experiments on the ImageNet dataset as well as downstream tasks on the MS COCO, ADE20K, and AP10K datasets validate the superiority of our models over …

引用总数

被引用次数：191

20222023202448 84 56

学术搜索中的文章

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

Q Zhang, Y Xu, J Zhang, D Tao - International Journal of Computer Vision, 2023

被引用次数：191 相关文章所有 7 个版本