相关文章- 学术资源搜索

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：2003 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：363 相关文章所有 3 个版本

[PDF] thecvf.com

Incorporating convolution designs into visual transformers

K Yuan, S Guo, Z Liu, A Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com

Motivated by the success of Transformers in natural language processing (NLP) tasks, there
exist some attempts (eg, ViT and DeiT) to apply Transformers to the vision domain. However …

被引用次数：545 相关文章所有 6 个版本

[PDF] arxiv.org

Three things everyone should know about vision transformers

H Touvron, M Cord, A El-Nouby, J Verbeek… - European Conference on …, 2022 - Springer

After their initial success in natural language processing, transformer architectures have
rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as …

被引用次数：98 相关文章所有 6 个版本

[PDF] arxiv.org

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

被引用次数：2483 相关文章所有 8 个版本

[PDF] thecvf.com

Rethinking spatial dimensions of vision transformers

B Heo, S Yun, D Han, S Chun… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) extends the application range of transformers from
language processing to computer vision tasks as being an alternative architecture against …

被引用次数：618 相关文章所有 8 个版本

[PDF] thecvf.com

Tokens-to-token vit: Training vision transformers from scratch on imagenet

L Yuan, Y Chen, T Wang, W Yu, Y Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers, which are popular for language modeling, have been explored for solving
vision tasks recently, eg, the Vision Transformer (ViT) for image classification. The ViT model …

被引用次数：2141 相关文章所有 7 个版本

[PDF] arxiv.org

Pyramidtnt: Improved transformer-in-transformer baselines with pyramid architecture

K Han, J Guo, Y Tang, Y Wang - arXiv preprint arXiv:2201.00978, 2022 - arxiv.org

Transformer networks have achieved great progress for computer vision tasks. Transformer-
in-Transformer (TNT) architecture utilizes inner transformer and outer transformer to extract …

被引用次数：30 相关文章所有 2 个版本

[PDF] mdpi.com

A survey of visual transformers

Y Liu, Y Zhang, Y Wang, F Hou, J Yuan… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …

被引用次数：356 相关文章所有 22 个版本

[PDF] arxiv.org

Regionvit: Regional-to-local attention for vision transformers

CF Chen, R Panda, Q Fan - arXiv preprint arXiv:2106.02689, 2021 - arxiv.org

Vision transformer (ViT) has recently shown its strong capability in achieving comparable
results to convolutional neural networks (CNNs) on image classification. However, vanilla …

被引用次数：197 相关文章所有 5 个版本

高级搜索

QQ 群

A survey on vision transformer

A survey on visual transformer

Incorporating convolution designs into visual transformers

Three things everyone should know about vision transformers

Transformers in vision: A survey

Rethinking spatial dimensions of vision transformers

Tokens-to-token vit: Training vision transformers from scratch on imagenet

Pyramidtnt: Improved transformer-in-transformer baselines with pyramid architecture

A survey of visual transformers

Regionvit: Regional-to-local attention for vision transformers

相关搜索

引用