Transformers in medical imaging: A survey

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier
Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

[HTML][HTML] Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Segnext: Rethinking convolutional attention design for semantic segmentation

MH Guo, CZ Lu, Q Hou, Z Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present SegNeXt, a simple convolutional network architecture for semantic
segmentation. Recent transformer-based models have dominated the field of se-mantic …

Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip

Q Yu, J He, X Deng, X Shen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …

Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion

Z Zhao, H Bai, J Zhang, Y Zhang, S Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of
different modalities, eg, functional highlight and detailed textures. To tackle the challenge in …

SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer

J Ma, L Tang, F Fan, J Huang, X Mei… - IEEE/CAA Journal of …, 2022 - ieeexplore.ieee.org
This study proposes a novel general image fusion framework based on cross-domain long-
range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

Visual prompt tuning

M Jia, L Tang, BC Chen, C Cardie, S Belongie… - … on Computer Vision, 2022 - Springer
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …