Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Minivit: Compressing vision transformers with weight multiplexing

J Zhang, H Peng, K Wu, M Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Vision Transformer (ViT) models have recently drawn much attention in computer
vision due to their high model capability. However, ViT models suffer from huge number of …

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation

Z Hao, J Guo, K Han, Y Tang, H Hu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) has proven to be a highly effective approach for
enhancing model performance through a teacher-student training scheme. However, most …

Gold-YOLO: Efficient object detector via gather-and-distribute mechanism

C Wang, W He, Y Nie, J Guo, C Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
In the past years, YOLO-series models have emerged as the leading approaches in the area
of real-time object detection. Many studies pushed up the baseline to a higher level by …

I-vit: Integer-only quantization for efficient vision transformer inference

Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …

Diffrate: Differentiable compression rate for efficient vision transformers

M Chen, W Shao, P Xu, M Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Token compression aims to speed up large-scale vision transformers (eg ViTs) by pruning
(dropping) or merging tokens. It is an important but challenging task. Although recent …