Vitpose: Simple vision transformer baselines for human pose estimation

Y Xu, J Zhang, Q Zhang, D Tao - Advances in Neural …, 2022 - proceedings.neurips.cc
Although no specific domain knowledge is considered in the design, plain vision
transformers have shown excellent performance in visual recognition tasks. However, little …

Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond

Q Zhang, Y Xu, J Zhang, D Tao - International Journal of Computer Vision, 2023 - Springer
Vision transformers have shown great potential in various computer vision tasks owing to
their strong capability to model long-range dependency using the self-attention mechanism …

Advancing plain vision transformer toward remote sensing foundation model

D Wang, Q Zhang, Y Xu, J Zhang, B Du… - … on Geoscience and …, 2022 - ieeexplore.ieee.org
Large-scale vision foundation models have made significant progress in visual tasks on
natural images, with vision transformers (ViTs) being the primary choice due to their good …

Fast vision transformers with hilo attention

Z Pan, J Cai, B Zhuang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have triggered the most recent and significant
breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect …

N-gram in swin transformers for efficient lightweight image super-resolution

H Choi, J Lee, J Yang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
While some studies have proven that Swin Transformer (Swin) with window self-attention
(WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad …

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer
Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

ESSAformer: Efficient transformer for hyperspectral image super-resolution

M Zhang, C Zhang, Q Zhang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-
resolution hyperspectral image from a low-resolution observation. However, the prevailing …

Learning graph neural networks for image style transfer

Y Jing, Y Mao, Y Yang, Y Zhan, M Song… - … on Computer Vision, 2022 - Springer
State-of-the-art parametric and non-parametric style transfer approaches are prone to either
distorted local style patterns due to global statistics alignment, or unpleasing artifacts …

Swin3d: A pretrained transformer backbone for 3d indoor scene understanding

YQ Yang, YX Guo, JY Xiong, Y Liu, H Pan… - arXiv preprint arXiv …, 2023 - arxiv.org
The use of pretrained backbones with fine-tuning has been successful for 2D vision and
natural language processing tasks, showing advantages over task-specific networks. In this …