Dual vision transformer

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier

The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

被引用次数：148 相关文章所有 7 个版本

[PDF] arxiv.org

Wave-vit: Unifying wavelet and transformers for visual representation learning

T Yao, Y Pan, Y Li, CW Ngo, T Mei - European Conference on Computer …, 2022 - Springer

Abstract Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for
computer vision tasks, while the self-attention computation in Transformer scales …

被引用次数：161 相关文章所有 7 个版本

[PDF] arxiv.org

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

被引用次数：169 相关文章所有 9 个版本

[PDF] thecvf.com

Rmt: Retentive networks meet vision transformers

Q Fan, H Huang, M Chen, H Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision
community in recent years. However the core component of ViT Self-Attention lacks explicit …

被引用次数：68 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of the vision transformers and their CNN-transformer based variants

A Khan, Z Rauf, A Sohail, AR Khan, H Asif… - Artificial Intelligence …, 2023 - Springer

Vision transformers have become popular as a possible substitute to convolutional neural
networks (CNNs) for a variety of computer vision applications. These transformers, with their …

被引用次数：108 相关文章所有 6 个版本

CRFormer: cross-resolution transformer for segmentation of grape leaf diseases with context mining

X Zhang, C Cen, F Li, M Liu, W Mu - Expert Systems with Applications, 2023 - Elsevier

In the smart agriculture community, automatic segmentation is an important basis for plant
disease detection and identification. However, the complex background and texturally rich …

被引用次数：16 相关文章所有 2 个版本

[PDF] thecvf.com

Learning orthogonal prototypes for generalized few-shot semantic segmentation

SA Liu, Y Zhang, Z Qiu, H Xie… - Proceedings of the …, 2023 - openaccess.thecvf.com

Generalized few-shot semantic segmentation (GFSS) distinguishes pixels of base and novel
classes from the background simultaneously, conditioning on sufficient data of base classes …

被引用次数：41 相关文章所有 3 个版本

[PDF] thecvf.com

Objectfusion: Multi-modal 3d object detection with object-centric fusion

Q Cai, Y Pan, T Yao, CW Ngo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View)
based fusion, which effectively unifies both LiDAR point clouds and camera images in a …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark

Y Gao, M Zhou, D Liu, Z Yan, S Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformers have demonstrated remarkable performance in natural language processing
and computer vision. However, existing vision Transformers struggle to learn from limited …

被引用次数：113 相关文章所有 2 个版本

[PDF] arxiv.org

Control3d: Towards controllable text-to-3d generation

Y Chen, Y Pan, Y Li, T Yao, T Mei - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Recent remarkable advances in large-scale text-to-image diffusion models have inspired a
significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a …

被引用次数：42 相关文章所有 4 个版本

高级搜索

QQ 群

Advances in medical image analysis with vision transformers: a comprehensive review

Wave-vit: Unifying wavelet and transformers for visual representation learning

Metaformer baselines for vision

Rmt: Retentive networks meet vision transformers

A survey of the vision transformers and their CNN-transformer based variants

CRFormer: cross-resolution transformer for segmentation of grape leaf diseases with context mining

Learning orthogonal prototypes for generalized few-shot semantic segmentation

Objectfusion: Multi-modal 3d object detection with object-centric fusion

A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark

Control3d: Towards controllable text-to-3d generation

引用