相关文章- 学术资源搜索

Merging Vision Transformers from Different Tasks and Domains

P Ye, C Huang, M Shen, T Chen, Y Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

This work targets to merge various Vision Transformers (ViTs) trained on different tasks (ie,
datasets with different object categories) or domains (ie, datasets with the same categories …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

BViT: Broad attention-based vision transformer

N Li, Y Chen, W Li, Z Ding, D Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Recent works have demonstrated that transformer can achieve promising performance in
computer vision, by exploiting the relationship among image patches with self-attention …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Deepvit: Towards deeper vision transformer

D Zhou, B Kang, X Jin, L Yang, X Lian, Z Jiang… - arXiv preprint arXiv …, 2021 - arxiv.org

Vision transformers (ViTs) have been successfully applied in image classification tasks
recently. In this paper, we show that, unlike convolution neural networks (CNNs) that can be …

被引用次数：543 相关文章所有 4 个版本

[PDF] arxiv.org

Enhancing performance of vision transformers on small datasets through local inductive bias incorporation

IB Akkaya, SS Kathiresan, E Arani, B Zonooz - Pattern Recognition, 2024 - Elsevier

Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to
perform worse than convolutional neural networks (CNNs) when trained from scratch on …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection

F Ataiefard, W Ahmed, H Hajimolahoseini… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision transformers are known to be more computationally and data-intensive than CNN
models. These transformer models such as ViT, require all the input image tokens to learn …

A unified pruning framework for vision transformers

H Yu, J Wu - Science China Information Sciences, 2023 - Springer

Conclusion In this study, we proposed a novel method called UP-ViTs to prune ViTs in a
unified manner. Our framework can prune all components in a ViT and its variants, maintain …

被引用次数：38 相关文章所有 9 个版本

[PDF] arxiv.org

Oamixer: Object-aware mixing layer for vision transformers

H Kang, S Mo, J Shin - arXiv preprint arXiv:2212.06595, 2022 - arxiv.org

Patch-based models, eg, Vision Transformers (ViTs) and Mixers, have shown impressive
results on various visual recognition tasks, alternating classic convolutional networks. While …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

The principle of diversity: Training stronger vision transformers calls for reducing all levels of redundancy

T Chen, Z Zhang, Y Cheng… - Proceedings of the …, 2022 - openaccess.thecvf.com

Vision transformers (ViTs) have gained increasing popularity as they are commonly believed
to own higher modeling capacity and representation flexibility, than traditional convolutional …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Holistically explainable vision transformers

M Böhle, M Fritz, B Schiele - arXiv preprint arXiv:2301.08669, 2023 - arxiv.org

Transformers increasingly dominate the machine learning landscape across many tasks and
domains, which increases the importance for understanding their outputs. While their …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Not all patches are what you need: Expediting vision transformers via token reorganizations

Y Liang, C Ge, Z Tong, Y Song, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head
self-attention (MHSA) among them. Complete leverage of these image tokens brings …

被引用次数：240 相关文章所有 7 个版本

高级搜索

QQ 群

Merging Vision Transformers from Different Tasks and Domains

BViT: Broad attention-based vision transformer

Deepvit: Towards deeper vision transformer

Enhancing performance of vision transformers on small datasets through local inductive bias incorporation

SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection

A unified pruning framework for vision transformers

Oamixer: Object-aware mixing layer for vision transformers

The principle of diversity: Training stronger vision transformers calls for reducing all levels of redundancy

Holistically explainable vision transformers

Not all patches are what you need: Expediting vision transformers via token reorganizations

引用