Recent years have seen a phenomenal rise in the performance and applications of transformer neural networks. The family of transformer networks, including Bidirectional …
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation …
Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism. Thanks to its strong representation …
Abstract Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. However, ViT models suffer from huge number of …
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its …
Abstract Knowledge distillation (KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most …
C Wang, W He, Y Nie, J Guo, C Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by …
Z Li, Q Gu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision applications. However, these models have considerable storage and …
Token compression aims to speed up large-scale vision transformers (eg ViTs) by pruning (dropping) or merging tokens. It is an important but challenging task. Although recent …