Sparse r-cnn: End-to-end object detection with learnable proposals

Y Zhao, W Lv, S Xu, J Wei, G Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The YOLO series has become the most popular framework for real-time object detection due
to its reasonable trade-off between speed and accuracy. However we observe that the …

被引用次数：193 相关文章所有 2 个版本

[PDF] thecvf.com

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

被引用次数：7 相关文章所有 2 个版本

[PDF] neurips.cc

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in Neural …, 2024 - proceedings.neurips.cc

Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

被引用次数：19 相关文章所有 4 个版本

[PDF] thecvf.com

Sparse semi-detr: Sparse learnable queries for semi-supervised object detection

T Shehzadi, KA Hashmi, D Stricker… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we address the limitations of the DETR-based semi-supervised object detection
(SSOD) framework particularly focusing on the challenges posed by the quality of object …

被引用次数：4 相关文章所有 4 个版本

[PDF] neurips.cc

ClusterFomer: Clustering As A Universal Visual Learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

被引用次数：14 相关文章所有 4 个版本

[PDF] thecvf.com

Egtr: Extracting graph from transformer for scene graph generation

J Im, JY Nam, N Park, H Lee… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Scene Graph Generation (SGG) is a challenging task of detecting objects and
predicting relationships between objects. After DETR was developed one-stage SGG …

被引用次数：2 相关文章所有 3 个版本

Joint discriminative representation learning for end-to-end person search

P Zhang, X Yu, X Bai, C Wang, J Zheng, X Ning - Pattern Recognition, 2024 - Elsevier

Person search simultaneously detects and retrieves a query person from uncropped scene
images. Existing methods are either two-step or end-to-end. The former employs two …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Yolov10: Real-time end-to-end object detection

A Wang, H Chen, L Liu, K Chen, Z Lin, J Han… - arXiv preprint arXiv …, 2024 - arxiv.org

Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-
time object detection owing to their effective balance between computational cost and …

被引用次数：6 相关文章所有 2 个版本

[PDF] springer.com

Hyperbolic deep learning in computer vision: A survey

P Mettes, M Ghadimi Atigh, M Keller-Ressel… - International Journal of …, 2024 - Springer

Deep representation learning is a ubiquitous part of modern computer vision. While
Euclidean space has been the de facto standard manifold for learning visual …

被引用次数：11 相关文章所有 2 个版本

SCA-YOLO: A new small object detection model for UAV images

S Zeng, W Yang, Y Jiao, L Geng, X Chen - The Visual Computer, 2024 - Springer

Object detection from UAV (unmanned aerial vehicle) images is a crucial and challenging
task in the field of computer vision. The task suffers from the difficulties of small dense …

被引用次数：11 相关文章所有 2 个版本

高级搜索

QQ 群