相关文章- 学术资源搜索

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

被引用次数：321 相关文章所有 10 个版本

[PDF] thecvf.com

Mpvit: Multi-path vision transformer for dense prediction

Y Lee, J Kim, J Willette… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Dense computer vision tasks such as object detection and segmentation require effective
multi-scale feature representation for detecting or classifying objects or regions with varying …

被引用次数：244 相关文章所有 9 个版本

[PDF] thecvf.com

[PDF][PDF] Vision transformer with super token sampling

H Huang, X Zhou, J Cao, R He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Vision transformer has achieved impressive performance for many vision tasks. However, it
may suffer from high redundancy in capturing local features for shallow layers. Local self …

被引用次数：29 相关文章所有 3 个版本

[PDF] thecvf.com

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com

Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

被引用次数：86 相关文章所有 7 个版本

[PDF] arxiv.org

Fastervit: Fast vision transformers with hierarchical attention

A Hatamizadeh, G Heinrich, H Yin, A Tao… - arXiv preprint arXiv …, 2023 - arxiv.org

We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus
on high image throughput for computer vision (CV) applications. FasterViT combines the …

被引用次数：27 相关文章所有 3 个版本

[PDF] thecvf.com

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

D Shi - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Due to the depth degradation effect in residual connections many efficient Vision
Transformers models that rely on stacking layers for information exchange often fail to form …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Local-to-global self-attention in vision transformers

J Li, Y Yan, S Liao, X Yang, L Shao - arXiv preprint arXiv:2107.04735, 2021 - arxiv.org

Transformers have demonstrated great potential in computer vision tasks. To avoid dense
computations of self-attentions in high-resolution visual data, some recent Transformer …

被引用次数：38 相关文章所有 2 个版本

[PDF] thecvf.com

Learned queries for efficient local attention

M Arar, A Shamir, AH Bermano - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Abstract Vision Transformers (ViT) serve as powerful vision models. Unlike convolutional
neural networks, which dominated vision research in previous years, vision transformers …

被引用次数：26 相关文章所有 5 个版本

[PDF] thecvf.com

FastViT: A fast hybrid vision transformer using structural reparameterization

PKA Vasu, J Gabriel, J Zhu, O Tuzel… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent amalgamation of transformer and convolutional designs has led to steady
improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a …

被引用次数：67 相关文章所有 5 个版本

[PDF] thecvf.com

Vision transformer with deformable attention

Z Xia, X Pan, S Song, LE Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Transformers have recently shown superior performances on various vision tasks. The large,
sometimes even global, receptive field endows Transformer models with higher …

被引用次数：394 相关文章所有 6 个版本

高级搜索

QQ 群

Biformer: Vision transformer with bi-level routing attention

Mpvit: Multi-path vision transformer for dense prediction

[PDF][PDF] Vision transformer with super token sampling

Transmix: Attend to mix for vision transformers

Fastervit: Fast vision transformers with hierarchical attention

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Local-to-global self-attention in vision transformers

Learned queries for efficient local attention

FastViT: A fast hybrid vision transformer using structural reparameterization

Vision transformer with deformable attention

相关搜索

引用