Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

R Guan, R Zhang, N Ouyang, J Liu, KL Man… - arXiv preprint arXiv …, 2024 - arxiv.org
Embodied perception is essential for intelligent vehicles and robots, enabling more natural
interaction and task execution. However, these advancements currently embrace vision …

Transcrib3D: 3D Referring Expression Resolution through Large Language Models

J Fang, X Tan, S Lin, I Vasiljevic, V Guizilini… - arXiv preprint arXiv …, 2024 - arxiv.org
If robots are to work effectively alongside people, they must be able to interpret natural
language references to objects in their 3D environment. Understanding 3D referring …

Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning

J Fang, X Tan, S Lin, H Mei, M Walter - 2nd Workshop on Language …, 2023 - openreview.net
If robots are to work effectively alongside people, they must be able to interpret natural
language references to objects in their 3D environment. Understanding 3D referring …

3d-stmn: Dependency-driven superpoint-text matching network for end-to-end 3d referring expression segmentation

C Wu, Y Ma, Q Chen, H Wang, G Luo, J Ji… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-
stage paradigm, extracting segmentation proposals and then matching them with referring …

Spatiality-guided transformer for 3d dense captioning on point clouds

H Wang, C Zhang, J Yu, W Cai - arXiv preprint arXiv:2204.10688, 2022 - arxiv.org
Dense captioning in 3D point clouds is an emerging vision-and-language task involving
object-level 3D scene understanding. Apart from coarse semantic class prediction and …

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

KC Huang, X Li, L Qi, S Yan, MH Yang - arXiv preprint arXiv:2405.17427, 2024 - arxiv.org
Recent advancements in multimodal large language models (LLMs) have shown their
potential in various domains, especially concept reasoning. Despite these developments …

Language-guided 3d object detection in point cloud for autonomous driving

W Cheng, J Yin, W Li, R Yang, J Shen - arXiv preprint arXiv:2305.15765, 2023 - arxiv.org
This paper addresses the problem of 3D referring expression comprehension (REC) in
autonomous driving scenario, which aims to ground a natural language to the targeted …

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Y Tang, X Han, X Li, Q Yu, Y Hao, L Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging
Large Language Models (LLMs) with images using a simple projector. Inspired by their …

Vote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning

S Chen, H Zhu, M Li, X Chen, P Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
3D dense captioning requires a model to translate its understanding of an input 3D scene
into several captions associated with different object regions. Existing methods adopt a …

ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes

S Kato, S Kurita, C Chu… - Findings of the Association …, 2023 - aclanthology.org
Abstract 3D referring expression comprehension is a task to ground text representations
onto objects in 3D scenes. It is a crucial task for indoor household robots or augmented …