A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Ptqd: Accurate post-training quantization for diffusion models

Y He, L Liu, J Liu, W Wu, H Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have recently dominated image synthesis and other related generative
tasks. However, the iterative denoising process is expensive in computations at inference …

Which tokens to use? investigating token reduction in vision transformers

JB Haurum, S Escalera, GW Taylor… - Proceedings of the …, 2023 - openaccess.thecvf.com
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs
more efficient by removing redundant information in the processed tokens. While different …

Repq-vit: Scale reparameterization for post-training quantization of vision transformers

Z Li, J Xiao, L Yang, Q Gu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Abstract Post-training quantization (PTQ), which only requires a tiny dataset for calibration
without end-to-end retraining, is a light and practical model compression technique …

I-vit: Integer-only quantization for efficient vision transformer inference

Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various
computer vision applications. However, these models have considerable storage and …

Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

Y Liu, H Yang, Z Dong, K Keutzer… - Proceedings of the …, 2023 - openaccess.thecvf.com
The complicated architecture and high training cost of vision transformers urge the
exploration of post-training quantization. However, the heavy-tailed distribution of vision …

Stitchable neural networks

Z Pan, J Cai, B Zhuang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
The public model zoo containing enormous powerful pretrained model families (eg,
ResNet/DeiT) has reached an unprecedented scope than ever, which significantly …

Jumping through local minima: Quantization in the loss landscape of vision transformers

N Frumkin, D Gope… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Quantization scale and bit-width are the most important parameters when considering how
to quantize a neural network. Prior work focuses on optimizing quantization scales in a …

Packqvit: Faster sub-8-bit vision transformers via full and packed quantization on the mobile

P Dong, L Lu, C Wu, C Lyu, G Yuan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract While Vision Transformers (ViTs) have undoubtedly made impressive strides in
computer vision (CV), their intricate network structures necessitate substantial computation …