Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description. However, existing methods only …
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D …
Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and …
The two popular datasets ScanRefer [20] and ReferIt3D [5] connect natural language to real- world 3D scenes. In this paper, we curate a complementary dataset extending both the …
Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or …
R Guan, L Jia, F Yang, S Yao, E Purwanto… - arXiv preprint arXiv …, 2024 - arxiv.org
The perception of waterways based on human intent holds significant importance for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water …
Z Qian, Y Ma, J Ji, X Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Referring 3D instance segmentation is a challenging task aimed at accurately segmenting a target instance within a 3D scene based on a given referring expression. However, previous …
Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D visual language reasoning plays an important role in effective human-computer interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D …