If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring …
C Wu, Y Ma, Q Chen, H Wang, G Luo, J Ji… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two- stage paradigm, extracting segmentation proposals and then matching them with referring …
If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring …
Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and …
KC Huang, X Li, L Qi, S Yan, MH Yang - arXiv preprint arXiv:2405.17427, 2024 - arxiv.org
Recent advancements in multimodal large language models (LLMs) have shown their potential in various domains, especially concept reasoning. Despite these developments …
W Cheng, J Yin, W Li, R Yang, J Shen - arXiv preprint arXiv:2305.15765, 2023 - arxiv.org
This paper addresses the problem of 3D referring expression comprehension (REC) in autonomous driving scenario, which aims to ground a natural language to the targeted …
Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their …
S Chen, H Zhu, M Li, X Chen, P Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a …
S Kato, S Kurita, C Chu… - Findings of the Association …, 2023 - aclanthology.org
Abstract 3D referring expression comprehension is a task to ground text representations onto objects in 3D scenes. It is a crucial task for indoor household robots or augmented …