Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

Mpvit: Multi-path vision transformer for dense prediction

Y Lee, J Kim, J Willette… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Dense computer vision tasks such as object detection and segmentation require effective
multi-scale feature representation for detecting or classifying objects or regions with varying …

[PDF][PDF] Vision transformer with super token sampling

H Huang, X Zhou, J Cao, R He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision transformer has achieved impressive performance for many vision tasks. However, it
may suffer from high redundancy in capturing local features for shallow layers. Local self …

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

Fastervit: Fast vision transformers with hierarchical attention

A Hatamizadeh, G Heinrich, H Yin, A Tao… - arXiv preprint arXiv …, 2023 - arxiv.org
We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus
on high image throughput for computer vision (CV) applications. FasterViT combines the …

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

D Shi - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Due to the depth degradation effect in residual connections many efficient Vision
Transformers models that rely on stacking layers for information exchange often fail to form …

Local-to-global self-attention in vision transformers

J Li, Y Yan, S Liao, X Yang, L Shao - arXiv preprint arXiv:2107.04735, 2021 - arxiv.org
Transformers have demonstrated great potential in computer vision tasks. To avoid dense
computations of self-attentions in high-resolution visual data, some recent Transformer …

Learned queries for efficient local attention

M Arar, A Shamir, AH Bermano - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Vision Transformers (ViT) serve as powerful vision models. Unlike convolutional
neural networks, which dominated vision research in previous years, vision transformers …

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

Vision transformer with deformable attention

Z Xia, X Pan, S Song, LE Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Transformers have recently shown superior performances on various vision tasks. The large,
sometimes even global, receptive field endows Transformer models with higher …