Open-vocabulary object detection using pseudo caption labels

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

被引用次数：123 相关文章所有 10 个版本

[PDF] arxiv.org

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Dst-det: Simple dynamic self-training for open-vocabulary object detection

S Xu, X Li, S Wu, W Zhang, Y Li, G Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of
categories observed during training. This work presents a simple yet effective strategy that …

被引用次数：11 相关文章所有 3 个版本

[PDF] thecvf.com

Hyperbolic learning with synthetic captions for open-world detection

F Kong, Y Chen, J Cai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Open-world detection poses significant challenges as it requires the detection of any object
using either object class labels or free-form texts. Existing related works often use large …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Approaching outside: scaling unsupervised 3D object detection from 2D scene

R Zhang, H Zhang, H Yu, Z Zheng - European Conference on Computer …, 2024 - Springer

The unsupervised 3D object detection is to accurately detect objects in unstructured
environments with no explicit supervisory signals. This task, given sparse LiDAR point …

被引用次数：2 相关文章所有 8 个版本

[PDF] arxiv.org

Mixed-Query Transformer: A Unified Image Segmentation Architecture

P Wang, Z Cai, H Yang, A Swaminathan… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing unified image segmentation models either employ a unified architecture across
multiple tasks but use separate weights tailored to each dataset, or apply a single set of …

被引用次数：4 相关文章所有 2 个版本

Asymmetric Aggregation Network for Accurate Ship Detection in Optical Imagery

Y Zhang, MJ Er - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org

Optical imagery ship detection has achieved significant developments recently. However,
accurate detection in complex scenes and for different-scale ships remains a vital challenge …

被引用次数：1 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] VisualSiteDiary: A detector-free Vision-Language Transformer model for captioning photologs for daily construction reporting and image retrievals

Y Jung, I Cho, SH Hsu, M Golparvar-Fard - Automation in Construction, 2024 - Elsevier

This paper presents VisualSiteDiary, a Vision Transformer-based image captioning model
which creates human-readable captions for daily progress and work activity log, and …

Retrieval-Augmented Open-Vocabulary Object Detection

J Kim, E Cho, S Kim, HJ Kim - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Open-vocabulary object detection (OVD) has been studied with Vision-Language Models
(VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches …

被引用次数：6 相关文章所有 3 个版本

DST-Det: Open-Vocabulary Object Detection via Dynamic Self-Training

S Xu, X Li, S Wu, W Zhang, Y Tong… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of
classes observed during training. This work introduces a straightforward and efficient …

高级搜索

QQ 群