查看文章

Delving deep into the generalization of vision transformers under distribution shifts

作者

Chongzhi Zhang#, Mingyuan Zhang#, Shanghang Zhang#, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Shuai Yi, Xianglong Liu, and Ziwei Liu

发表日期

2022

研讨会论文

Conference on Computer Vision and Pattern Recognition (CVPR)

简介

Recently, Vision Transformers have achieved impressive results on various Vision tasks. Yet, their generalization ability under different distribution shifts is poorly understood. In this work, we provide a comprehensive study on the out-of-distribution generalization of Vision Transformers. To support a systematic investigation, we first present a taxonomy of distribution shifts by categorizing them into five conceptual levels: corruption shift, background shift, texture shift, destruction shift, and style shift. Then we perform extensive evaluations of Vision Transformer variants under different levels of distribution shifts and compare their generalization ability with Convolutional Neural Network (CNN) models. Several important observations are obtained: 1) Vision Transformers generalize better than CNNs under multiple distribution shifts. With the same or less amount of parameters, Vision Transformers are ahead of corresponding CNNs by more than 5% in top-1 accuracy under most types of distribution shift. In particular, Vision Transformers lead by more than 10% under the corruption shifts. 2) larger Vision Transformers gradually narrow the in-distribution (ID) and out-of-distribution (OOD) performance gap. To further improve the generalization of Vision Transformers, we design the enhanced Vision Transformers through self-supervised learning, information theory, and adversarial learning. By investigating these three types of generalization-enhanced Transformers, we observe the gradient-sensitivity of Vision Transformers and design a smoother learning strategy to achieve a stable training process. With modified training schemes, we achieve …

引用总数

被引用次数：87

20212022202320242 21 39 25

学术搜索中的文章

Delving deep into the generalization of vision transformers under distribution shifts

C Zhang, M Zhang, S Zhang, D Jin, Q Zhou, Z Cai… - Proceedings of the IEEE/CVF conference on Computer …, 2022

被引用次数：87 相关文章所有 7 个版本