Recent advances in vision transformer: A survey and outlook of recent work

K Islam - arXiv preprint arXiv:2203.01536, 2022 - arxiv.org
Vision Transformers (ViTs) are becoming more popular and dominating technique for
various vision tasks, compare to Convolutional Neural Networks (CNNs). As a demanding …

[HTML][HTML] TransU-Net++: Rethinking attention gated TransU-Net for deforestation mapping

A Jamali, SK Roy, J Li, P Ghamisi - International Journal of Applied Earth …, 2023 - Elsevier
Deforestation has become a major cause of climate change, and as a result, both
characterizing the drivers and estimating segmentation maps of deforestation have piqued …

Dual cross-attention learning for fine-grained visual categorization and object re-identification

H Zhu, W Ke, D Li, J Liu, L Tian… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, self-attention mechanisms have shown impressive performance in various NLP
and CV tasks, which can help capture sequential characteristics and derive global …

Generative prompt model for weakly supervised object localization

Y Zhao, Q Ye, W Wu, C Shen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Weakly supervised object localization (WSOL) remains challenging when learning object
localization models from image category labels. Conventional methods that discriminatively …

TransIFC: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification

H Liu, C Zhang, Y Deng, B Xie, T Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Fine-grained bird image classification (FBIC) is not only meaningful for endangered bird
observation and protection but also a prevalent task for image classification in multimedia …

Feature fusion vision transformer for fine-grained visual categorization

J Wang, X Yu, Y Gao - arXiv preprint arXiv:2107.02341, 2021 - arxiv.org
The core for tackling the fine-grained visual categorization (FGVC) is to learn subtle yet
discriminative features. Most previous works achieve this by explicitly selecting the …

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

Learning bottleneck concepts in image classification

B Wang, L Li, Y Nakashima… - Proceedings of the ieee …, 2023 - openaccess.thecvf.com
Interpreting and explaining the behavior of deep neural networks is critical for many tasks.
Explainable AI provides a way to address this challenge, mostly by providing per-pixel …

Which tokens to use? investigating token reduction in vision transformers

JB Haurum, S Escalera, GW Taylor… - Proceedings of the …, 2023 - openaccess.thecvf.com
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs
more efficient by removing redundant information in the processed tokens. While different …

Vit-net: Interpretable vision transformers with neural tree decoder

S Kim, J Nam, BC Ko - International conference on machine …, 2022 - proceedings.mlr.press
Vision transformers (ViTs), which have demonstrated a state-of-the-art performance in image
classification, can also visualize global interpretations through attention-based contributions …