Vsa: Learning varied-size window attention in vision transformers

Y Xu, J Zhang, Q Zhang, D Tao - Advances in Neural …, 2022 - proceedings.neurips.cc

Although no specific domain knowledge is considered in the design, plain vision
transformers have shown excellent performance in visual recognition tasks. However, little …

被引用次数：435 相关文章所有 5 个版本

[PDF] arxiv.org

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

Q Zhang, Y Xu, J Zhang, D Tao - International Journal of Computer Vision, 2023 - Springer

Vision transformers have shown great potential in various computer vision tasks owing to
their strong capability to model long-range dependency using the self-attention mechanism …

被引用次数：194 相关文章所有 7 个版本

[PDF] arxiv.org

Advancing plain vision transformer toward remote sensing foundation model

D Wang, Q Zhang, Y Xu, J Zhang, B Du… - … on Geoscience and …, 2022 - ieeexplore.ieee.org

Large-scale vision foundation models have made significant progress in visual tasks on
natural images, with vision transformers (ViTs) being the primary choice due to their good …

被引用次数：158 相关文章所有 4 个版本

[PDF] neurips.cc

Fast vision transformers with hilo attention

Z Pan, J Cai, B Zhuang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Vision Transformers (ViTs) have triggered the most recent and significant
breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect …

被引用次数：114 相关文章所有 8 个版本

[PDF] thecvf.com

N-gram in swin transformers for efficient lightweight image super-resolution

H Choi, J Lee, J Yang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

While some studies have proven that Swin Transformer (Swin) with window self-attention
(WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad …

被引用次数：75 相关文章所有 5 个版本

[PDF] thecvf.com

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer

Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

被引用次数：42 相关文章所有 6 个版本

[PDF] thecvf.com

ESSAformer: Efficient transformer for hyperspectral image super-resolution

M Zhang, C Zhang, Q Zhang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-
resolution hyperspectral image from a low-resolution observation. However, the prevailing …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Learning graph neural networks for image style transfer

Y Jing, Y Mao, Y Yang, Y Zhan, M Song… - … on Computer Vision, 2022 - Springer

State-of-the-art parametric and non-parametric style transfer approaches are prone to either
distorted local style patterns due to global statistics alignment, or unpleasing artifacts …

被引用次数：56 相关文章所有 5 个版本

[PDF] arxiv.org

Swin3d: A pretrained transformer backbone for 3d indoor scene understanding

YQ Yang, YX Guo, JY Xiong, Y Liu, H Pan… - arXiv preprint arXiv …, 2023 - arxiv.org

The use of pretrained backbones with fine-tuning has been successful for 2D vision and
natural language processing tasks, showing advantages over task-specific networks. In this …

被引用次数：44 相关文章所有 2 个版本

高级搜索

QQ 群