Sparse r-cnn: End-to-end object detection with learnable proposals

J Wang, P Zhang, T Chu, Y Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent advances in detecting arbitrary objects in the real world are trained and evaluated
on object detection datasets with a relatively restricted vocabulary. To facilitate the …

被引用次数：35 相关文章所有 5 个版本

[PDF] arxiv.org

Vision-centric bev perception: A survey

Y Ma, T Wang, X Bai, H Yang, Y Hou, Y Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …

被引用次数：93 相关文章所有 2 个版本

[PDF] arxiv.org

Shuffle transformer: Rethinking spatial shuffle for vision transformer

Z Huang, Y Ben, G Luo, P Cheng, G Yu… - arXiv preprint arXiv …, 2021 - arxiv.org

Very recently, Window-based Transformers, which computed self-attention within non-
overlapping local windows, demonstrated promising results on image classification …

被引用次数：179 相关文章所有 2 个版本

视觉Transformer 研究的关键问题: 现状及展望

田永林，王雨桐，王建功，王晓，王飞跃 - 自动化学报, 2022 - aas.net.cn

Transformer 所具备的长距离建模能力和并行计算能力使其在自然语言处理领域取得了巨大
成功并逐步拓展至计算机视觉等领域. 本文以分类任务为切入, 介绍了典型视觉Transformer …

被引用次数：24 相关文章所有 3 个版本

[PDF] neurips.cc

Clusterfomer: clustering as a universal visual learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in neural …, 2024 - proceedings.neurips.cc

This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

被引用次数：21 相关文章所有 5 个版本

[PDF] thecvf.com

Small object detection via coarse-to-fine proposal generation and imitation learning

X Yuan, G Cheng, K Yan, Q Zeng… - Proceedings of the …, 2023 - openaccess.thecvf.com

The past few years have witnessed the immense success of object detection, while current
excellent detectors struggle on tackling size-limited instances. Concretely, the well-known …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

TransVOD: end-to-end video object detection with spatial-temporal transformers

Q Zhou, X Li, L He, Y Yang, G Cheng… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …

被引用次数：103 相关文章所有 8 个版本

[PDF] thecvf.com

Unihcp: A unified model for human-centric perceptions

Y Ci, Y Wang, M Chen, S Tang, L Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com

Human-centric perceptions (eg, pose estimation, human parsing, pedestrian detection,
person re-identification, etc.) play a key role in industrial applications of visual models. While …

被引用次数：32 相关文章所有 5 个版本

[PDF] thecvf.com

Adapt: Efficient multi-agent trajectory prediction with adaptation

G Aydemir, AK Akan, F Güney - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Forecasting future trajectories of agents in complex traffic scenes requires reliable and
efficient predictions for all agents in the scene. However, existing methods for trajectory …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

被引用次数：58 相关文章所有 3 个版本

高级搜索

QQ 群