Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

Adaptive rotated convolution for rotated object detection

Y Pu, Y Wang, Z Xia, Y Han, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Rotated object detection aims to identify and locate objects in images with arbitrary
orientation. In this scenario, the oriented directions of objects vary considerably across …

Rank-DETR for high quality object detection

Y Pu, W Liang, Y Hao, Y Yuan… - Advances in …, 2024 - proceedings.neurips.cc
Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …

Degradation-resistant unfolding network for heterogeneous image fusion

C He, K Li, G Xu, Y Zhang, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Heterogeneous image fusion (HIF) techniques aim to enhance image quality by merging
complementary information from images captured by different sensors. Among these …

Gsva: Generalized segmentation via multimodal large language models

Z Xia, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

Mask grounding for referring image segmentation

YX Chng, H Zheng, Y Han, X Qiu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Referring Image Segmentation (RIS) is a challenging task that requires an
algorithm to segment objects referred by free-form language expressions. Despite significant …

Fine-grained recognition with learnable semantic data augmentation

Y Pu, Y Han, Y Wang, J Feng, C Deng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Fine-grained image recognition is a longstanding computer vision challenge that focuses on
differentiating objects belonging to multiple subordinate categories within the same meta …

Efficienttrain: Exploring generalized curriculum learning for training visual backbones

Y Wang, Y Yue, R Lu, T Liu, Z Zhong… - Proceedings of the …, 2023 - openaccess.thecvf.com
The superior performance of modern deep networks usually comes with a costly training
procedure. This paper presents a new curriculum learning approach for the efficient training …

Deep incubation: Training large models by divide-and-conquering

Z Ni, Y Wang, J Yu, H Jiang, Y Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent years have witnessed a remarkable success of large deep learning models.
However, training these models is challenging due to high computational costs, painfully …

Agent attention: On the integration of softmax and linear attention

D Han, T Ye, Y Han, Z Xia, S Song, G Huang - arXiv preprint arXiv …, 2023 - arxiv.org
The attention module is the key component in Transformers. While the global attention
mechanism offers high expressiveness, its excessive computational cost restricts its …