V3det: Vast vocabulary visual detection dataset

J Wang, P Zhang, T Chu, Y Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in detecting arbitrary objects in the real world are trained and evaluated
on object detection datasets with a relatively restricted vocabulary. To facilitate the …

Vision-centric bev perception: A survey

Y Ma, T Wang, X Bai, H Yang, Y Hou, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …

Shuffle transformer: Rethinking spatial shuffle for vision transformer

Z Huang, Y Ben, G Luo, P Cheng, G Yu… - arXiv preprint arXiv …, 2021 - arxiv.org
Very recently, Window-based Transformers, which computed self-attention within non-
overlapping local windows, demonstrated promising results on image classification …

视觉Transformer 研究的关键问题: 现状及展望

田永林, 王雨桐, 王建功, 王晓, 王飞跃 - 自动化学报, 2022 - aas.net.cn
Transformer 所具备的长距离建模能力和并行计算能力使其在自然语言处理领域取得了巨大
成功并逐步拓展至计算机视觉等领域. 本文以分类任务为切入, 介绍了典型视觉Transformer …

Clusterfomer: clustering as a universal visual learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in neural …, 2024 - proceedings.neurips.cc
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

Small object detection via coarse-to-fine proposal generation and imitation learning

X Yuan, G Cheng, K Yan, Q Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com
The past few years have witnessed the immense success of object detection, while current
excellent detectors struggle on tackling size-limited instances. Concretely, the well-known …

TransVOD: end-to-end video object detection with spatial-temporal transformers

Q Zhou, X Li, L He, Y Yang, G Cheng… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …

Unihcp: A unified model for human-centric perceptions

Y Ci, Y Wang, M Chen, S Tang, L Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Human-centric perceptions (eg, pose estimation, human parsing, pedestrian detection,
person re-identification, etc.) play a key role in industrial applications of visual models. While …

Adapt: Efficient multi-agent trajectory prediction with adaptation

G Aydemir, AK Akan, F Güney - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Forecasting future trajectories of agents in complex traffic scenes requires reliable and
efficient predictions for all agents in the scene. However, existing methods for trajectory …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …