相关文章- 学术资源搜索

Training data-efficient image transformers & distillation through attention

H Touvron, M Cord, M Douze, F Massa… - International …, 2021 - proceedings.mlr.press

Recently, neural networks purely based on attention were shown to address image
understanding tasks such as image classification. These high-performing vision …

被引用次数：6239 相关文章所有 6 个版本

[PDF] thecvf.com

Going deeper with image transformers

H Touvron, M Cord, A Sablayrolles… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers have been recently adapted for large scale image classification, achieving
high scores shaking up the long supremacy of convolutional neural networks. However the …

被引用次数：1011 相关文章所有 5 个版本

[PDF] thecvf.com

Cmt: Convolutional neural networks meet vision transformers

J Guo, K Han, H Wu, Y Tang, X Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Vision transformers have been successfully applied to image recognition tasks due to their
ability to capture long-range dependencies within an image. However, there are still gaps in …

被引用次数：643 相关文章所有 6 个版本

[PDF] thecvf.com

Adavit: Adaptive vision transformers for efficient image recognition

L Meng, H Li, BC Chen, S Lan, Z Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Built on top of self-attention mechanisms, vision transformers have demonstrated
remarkable performance on a variety of vision tasks recently. While achieving excellent …

被引用次数：167 相关文章所有 5 个版本

[PDF] thecvf.com

Crossvit: Cross-attention multi-scale vision transformer for image classification

CFR Chen, Q Fan, R Panda - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

The recently developed vision transformer (ViT) has achieved promising results on image
classification compared to convolutional neural networks. Inspired by this, in this paper, we …

被引用次数：1355 相关文章所有 9 个版本

[PDF] academia.edu

[PDF][PDF] Token labeling: Training a 85.5% top-1 accuracy vision transformer with 56m parameters on imagenet

Z Jiang, Q Hou, L Yuan, D Zhou, X Jin… - arXiv preprint arXiv …, 2021 - academia.edu

This paper provides a strong baseline for vision transformers on the ImageNet classification
task. While recent vision transformers have demonstrated promising results in ImageNet …

被引用次数：53 相关文章

[PDF] thecvf.com

Incorporating convolution designs into visual transformers

K Yuan, S Guo, Z Liu, A Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com

Motivated by the success of Transformers in natural language processing (NLP) tasks, there
exist some attempts (eg, ViT and DeiT) to apply Transformers to the vision domain. However …

被引用次数：508 相关文章所有 6 个版本

[PDF] arxiv.org

An image is worth 16x16 words: Transformers for image recognition at scale

A Dosovitskiy, L Beyer, A Kolesnikov… - arXiv preprint arXiv …, 2020 - arxiv.org

While the Transformer architecture has become the de-facto standard for natural language
processing tasks, its applications to computer vision remain limited. In vision, attention is …

被引用次数：40236 相关文章所有 21 个版本

[PDF] researchgate.net

Refiner: Refining self-attention for vision transformers

D Zhou, Y Shi, B Kang, W Yu, Z Jiang, Y Li… - arXiv preprint arXiv …, 2021 - arxiv.org

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks
compared with CNNs. Yet, they generally require much more data for model pre-training …

被引用次数：66 相关文章所有 3 个版本

[PDF] thecvf.com

Tokens-to-token vit: Training vision transformers from scratch on imagenet

L Yuan, Y Chen, T Wang, W Yu, Y Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Transformers, which are popular for language modeling, have been explored for solving
vision tasks recently, eg, the Vision Transformer (ViT) for image classification. The ViT model …

被引用次数：2034 相关文章所有 7 个版本

高级搜索

QQ 群

Training data-efficient image transformers & distillation through attention

Going deeper with image transformers

Cmt: Convolutional neural networks meet vision transformers

Adavit: Adaptive vision transformers for efficient image recognition

Crossvit: Cross-attention multi-scale vision transformer for image classification

[PDF][PDF] Token labeling: Training a 85.5% top-1 accuracy vision transformer with 56m parameters on imagenet

Incorporating convolution designs into visual transformers

An image is worth 16x16 words: Transformers for image recognition at scale

Refiner: Refining self-attention for vision transformers

Tokens-to-token vit: Training vision transformers from scratch on imagenet

相关搜索

引用