Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Wave-vit: Unifying wavelet and transformers for visual representation learning

T Yao, Y Pan, Y Li, CW Ngo, T Mei - European Conference on Computer …, 2022 - Springer
Abstract Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for
computer vision tasks, while the self-attention computation in Transformer scales …

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer
Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

CRFormer: cross-resolution transformer for segmentation of grape leaf diseases with context mining

X Zhang, C Cen, F Li, M Liu, W Mu - Expert Systems with Applications, 2023 - Elsevier
In the smart agriculture community, automatic segmentation is an important basis for plant
disease detection and identification. However, the complex background and texturally rich …

Learning orthogonal prototypes for generalized few-shot semantic segmentation

SA Liu, Y Zhang, Z Qiu, H Xie… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generalized few-shot semantic segmentation (GFSS) distinguishes pixels of base and novel
classes from the background simultaneously, conditioning on sufficient data of base classes …

Objectfusion: Multi-modal 3d object detection with object-centric fusion

Q Cai, Y Pan, T Yao, CW Ngo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View)
based fusion, which effectively unifies both LiDAR point clouds and camera images in a …

A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark

Y Gao, M Zhou, D Liu, Z Yan, S Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Transformers have demonstrated remarkable performance in natural language processing
and computer vision. However, existing vision Transformers struggle to learn from limited …

Control3d: Towards controllable text-to-3d generation

Y Chen, Y Pan, Y Li, T Yao, T Mei - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Recent remarkable advances in large-scale text-to-image diffusion models have inspired a
significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a …