相关文章- 学术资源搜索

Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

被引用次数：368 相关文章所有 7 个版本

[PDF] mlr.press

Hiera: A hierarchical vision transformer without the bells-and-whistles

C Ryali, YT Hu, D Bolya, C Wei, H Fan… - International …, 2023 - proceedings.mlr.press

Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …

被引用次数：47 相关文章所有 6 个版本

[PDF] neurips.cc

Rest: An efficient transformer for visual recognition

Q Zhang, YB Yang - Advances in neural information …, 2021 - proceedings.neurips.cc

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably
served as a general-purpose backbone for image recognition. Unlike existing Transformer …

被引用次数：232 相关文章所有 6 个版本

[PDF] thecvf.com

Adavit: Adaptive vision transformers for efficient image recognition

L Meng, H Li, BC Chen, S Lan, Z Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Built on top of self-attention mechanisms, vision transformers have demonstrated
remarkable performance on a variety of vision tasks recently. While achieving excellent …

被引用次数：169 相关文章所有 5 个版本

[PDF] openreview.net

Hivit: A simpler and more efficient design of hierarchical vision transformer

X Zhang, Y Tian, L Xie, W Huang, Q Dai… - The Eleventh …, 2023 - openreview.net

There has been a debate on the choice of plain vs. hierarchical vision transformers, where
researchers often believe that the former (eg, ViT) has a simpler design but the latter (eg …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With
low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the …

被引用次数：293 相关文章所有 7 个版本

[PDF] thecvf.com

Scalable vision transformers with hierarchical pooling

Z Pan, B Zhuang, J Liu, H He… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

The recently proposed Visual image Transformers (ViT) with pure attention have achieved
promising performance on image recognition tasks, such as image classification. However …

被引用次数：144 相关文章所有 6 个版本

[PDF] arxiv.org

Not all patches are what you need: Expediting vision transformers via token reorganizations

Y Liang, C Ge, Z Tong, Y Song, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head
self-attention (MHSA) among them. Complete leverage of these image tokens brings …

被引用次数：244 相关文章所有 7 个版本

[PDF] arxiv.org

Discrete representations strengthen vision transformer robustness

C Mao, L Jiang, M Dehghani, C Vondrick… - arXiv preprint arXiv …, 2021 - arxiv.org

Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image
recognition. While recent studies suggest that ViTs are more robust than their convolutional …

被引用次数：42 相关文章所有 4 个版本

[PDF] thecvf.com

Visformer: The vision-friendly transformer

Z Chen, L Xie, J Niu, X Liu, L Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

The past year has witnessed the rapid development of applying the Transformer module to
vision problems. While some researchers have demonstrated that Transformer-based …

被引用次数：206 相关文章所有 6 个版本

高级搜索

QQ 群

Adaptformer: Adapting vision transformers for scalable visual recognition

Hiera: A hierarchical vision transformer without the bells-and-whistles

Rest: An efficient transformer for visual recognition

Adavit: Adaptive vision transformers for efficient image recognition

Hivit: A simpler and more efficient design of hierarchical vision transformer

Volo: Vision outlooker for visual recognition

Scalable vision transformers with hierarchical pooling

Not all patches are what you need: Expediting vision transformers via token reorganizations

Discrete representations strengthen vision transformer robustness

Visformer: The vision-friendly transformer

相关搜索

引用