T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation

K Huang, K Sun, E Xie, Z Li… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the stunning ability to generate high-quality images by recent text-to-image models,
current approaches often struggle to effectively compose objects with different attributes and …

Detecting twenty-thousand classes using image-level supervision

X Zhou, R Girdhar, A Joulin, P Krähenbühl… - European Conference on …, 2022 - Springer
Current object detectors are limited in vocabulary size due to the small scale of detection
datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Detecting everything in the open world: Towards universal object detection

Z Wang, Y Li, X Chen, SN Lim… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we formally address universal object detection, which aims to detect every
scene and predict every category. The dependence on human annotations, the limited …

Open-vocabulary object detection via vision and language knowledge distillation

X Gu, TY Lin, W Kuo, Y Cui - arXiv preprint arXiv:2104.13921, 2021 - arxiv.org
We aim at advancing open-vocabulary object detection, which detects objects described by
arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly …

Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions

P Kumar, S Chauhan, LK Awasthi - Archives of Computational Methods in …, 2024 - Springer
Human activity recognition is essential in many domains, including the medical and smart
home sectors. Using deep learning, we conduct a comprehensive survey of current state …

Towards large-scale 3d representation learning with multi-dataset point prompt training

X Wu, Z Tian, X Wen, B Peng, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rapid advancement of deep learning models is often attributed to their ability to leverage
massive training data. In contrast such privilege has not yet fully benefited 3D deep learning …

Damo-yolo: A report on real-time object detection design

X Xu, Y Jiang, W Chen, Y Huang, Y Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO,
which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is …

Hrs-bench: Holistic, reliable and scalable benchmark for text-to-image models

EM Bakr, P Sun, X Shen, FF Khan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Designing robust text-to-image (T2I) models have been extensively explored in recent years,
especially with the emergence of diffusion models, which achieves state-of-the-art results on …

Cascade-DETR: delving into high-quality universal object detection

M Ye, L Ke, S Li, YW Tai, CK Tang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Object localization in general environments is a fundamental part of vision systems. While
dominating on the COCO benchmark, recent Transformer-based detection methods are not …