A comprehensive survey of transformers for computer vision

S Jamil, M Jalil Piran, OJ Kwon - Drones, 2023 - mdpi.com
As a special type of transformer, vision transformers (ViTs) can be used for various computer
vision (CV) applications. Convolutional neural networks (CNNs) have several potential …

M³vit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design

Z Fan, R Sarkar, Z Jiang, T Chen… - Advances in …, 2022 - proceedings.neurips.cc
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often
lets those tasks learn better jointly. Multi-tasking models have become successful and often …

Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation

H Ma, Z Wang, Y Chen, D Kong, L Chen, X Liu… - … on Computer Vision, 2022 - Springer
Recently, the vision transformer and its variants have played an increasingly important role
in both monocular and multi-view human pose estimation. Considering image patches as …

Vitcod: Vision transformer acceleration via dedicated algorithm and accelerator co-design

H You, Z Sun, H Shi, Z Yu, Y Zhao… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision
tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their …

Deep convolutional pooling transformer for deepfake detection

T Wang, H Cheng, KP Chow, L Nie - ACM transactions on multimedia …, 2023 - dl.acm.org
Recently, Deepfake has drawn considerable public attention due to security and privacy
concerns in social media digital forensics. As the wildly spreading Deepfake videos on the …

Model quantization and hardware acceleration for vision transformers: A comprehensive survey

D Du, G Gong, X Chu - arXiv preprint arXiv:2405.00314, 2024 - arxiv.org
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a
promising alternative to convolutional neural networks (CNNs) in several vision-related …

A hybrid model for driver emotion detection using feature fusion approach

SB Sukhavasi, SB Sukhavasi, K Elleithy… - International journal of …, 2022 - mdpi.com
Machine and deep learning techniques are two branches of artificial intelligence that have
proven very efficient in solving advanced human problems. The automotive industry is …

ViTA: A vision transformer inference accelerator for edge applications

S Nag, G Datta, S Kundu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer,
have recently gained significant traction in computer vision tasks due to their ability to …

Tron: Transformer neural network acceleration with non-coherent silicon photonics

S Afifi, F Sunny, M Nikdast, S Pasricha - Proceedings of the Great Lakes …, 2023 - dl.acm.org
Transformer neural networks are rapidly being integrated into state-of-the-art solutions for
natural language processing (NLP) and computer vision. However, the complex structure of …

Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts

R Sarkar, H Liang, Z Fan, Z Wang… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
The computer vision community is embracing two promising learning paradigms: the Vision
Transformer (ViT) and Multi-task Learning (MTL). ViT models show extraordinary …