Vivit: A video vision transformer

A Arnab, M Dehghani, G Heigold… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present pure-transformer based models for video classification, drawing upon the recent
success of such models in image classification. Our model extracts spatio-temporal tokens …

Levit: a vision transformer in convnet's clothing for faster inference

B Graham, A El-Nouby, H Touvron… - Proceedings of the …, 2021 - openaccess.thecvf.com
We design a family of image classification architectures that optimize the trade-off between
accuracy and efficiency in a high-speed regime. Our work exploits recent findings in …

Training data-efficient image transformers & distillation through attention

H Touvron, M Cord, M Douze, F Massa… - International …, 2021 - proceedings.mlr.press
Recently, neural networks purely based on attention were shown to address image
understanding tasks such as image classification. These high-performing vision …

Revisiting resnets: Improved training and scaling strategies

I Bello, W Fedus, X Du, ED Cubuk… - Advances in …, 2021 - proceedings.neurips.cc
Novel computer vision architectures monopolize the spotlight, but the impact of the model
architecture is often conflated with simultaneous changes to training methodology and …

Generative adversarial transformers

DA Hudson, L Zitnick - International conference on machine …, 2021 - proceedings.mlr.press
We introduce the GANsformer, a novel and efficient type of transformer, and explore it for the
task of visual generative modeling. The network employs a bipartite structure that enables …

Rice leaf disease identification by residual-distilled transformer

C Zhou, Y Zhong, S Zhou, J Song, W Xiang - Engineering Applications of …, 2023 - Elsevier
As the worldwide planting crop, rice feeds nearly half of the world's population. However, the
continuous spread of diseases is threatening rice production. It is of great practical value to …

Augmenting convolutional networks with attention-based aggregation

H Touvron, M Cord, A El-Nouby, P Bojanowski… - arXiv preprint arXiv …, 2021 - arxiv.org
We show how to augment any convolutional network with an attention-based global map to
achieve non-local reasoning. We replace the final average pooling by an attention-based …

An information theoretic approach for attention-driven face forgery detection

K Sun, H Liu, T Yao, X Sun, S Chen, S Ding… - European Conference on …, 2022 - Springer
Recently, Deepfake arises as a powerful tool to fool the existing real-world face detection
systems, which has received wide attention in both academia and society. Most existing …

Interpretable CNN-multilevel attention transformer for rapid recognition of pneumonia from chest X-ray images

S Chen, S Ren, G Wang, M Huang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
Chest imaging plays an essential role in diagnosing and predicting patients with COVID-19
with evidence of worsening respiratory status. Many deep learning-based approaches for …

Local-aware global attention network for person re-identification based on body and hand images

NL Baisa - Journal of Visual Communication and Image …, 2024 - Elsevier
Learning representative, robust and discriminative information from images is essential for
effective person re-identification (Re-Id). In this paper, we propose a compound approach for …