Detrs beat yolos on real-time object detection

Y Zhao, W Lv, S Xu, J Wei, G Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The YOLO series has become the most popular framework for real-time object detection due
to its reasonable trade-off between speed and accuracy. However we observe that the …

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

Sparse semi-detr: Sparse learnable queries for semi-supervised object detection

T Shehzadi, KA Hashmi, D Stricker… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we address the limitations of the DETR-based semi-supervised object detection
(SSOD) framework particularly focusing on the challenges posed by the quality of object …

ClusterFomer: Clustering As A Universal Visual Learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

Egtr: Extracting graph from transformer for scene graph generation

J Im, JY Nam, N Park, H Lee… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Scene Graph Generation (SGG) is a challenging task of detecting objects and
predicting relationships between objects. After DETR was developed one-stage SGG …

Joint discriminative representation learning for end-to-end person search

P Zhang, X Yu, X Bai, C Wang, J Zheng, X Ning - Pattern Recognition, 2024 - Elsevier
Person search simultaneously detects and retrieves a query person from uncropped scene
images. Existing methods are either two-step or end-to-end. The former employs two …

Yolov10: Real-time end-to-end object detection

A Wang, H Chen, L Liu, K Chen, Z Lin, J Han… - arXiv preprint arXiv …, 2024 - arxiv.org
Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-
time object detection owing to their effective balance between computational cost and …

Hyperbolic deep learning in computer vision: A survey

P Mettes, M Ghadimi Atigh, M Keller-Ressel… - International Journal of …, 2024 - Springer
Deep representation learning is a ubiquitous part of modern computer vision. While
Euclidean space has been the de facto standard manifold for learning visual …

SCA-YOLO: A new small object detection model for UAV images

S Zeng, W Yang, Y Jiao, L Geng, X Chen - The Visual Computer, 2024 - Springer
Object detection from UAV (unmanned aerial vehicle) images is a crucial and challenging
task in the field of computer vision. The task suffers from the difficulties of small dense …