If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring …
Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or …
The two popular datasets ScanRefer [20] and ReferIt3D [5] connect natural language to real- world 3D scenes. In this paper, we curate a complementary dataset extending both the …
Recent research has evidenced the significant potentials of Large Language Models (LLMs) in handling challenging tasks within 3D scenes. However, current models are constrained to …
For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to …
Household robots operate in the same space for years. Such robots incrementally build dynamic maps that can be used for tasks requiring remote object localization. However …
Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision …
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D …
F Matsuzawa, Y Qiu, K Iwata… - … on Robotics and …, 2023 - ieeexplore.ieee.org
We introduce a new task of question generation to eliminate the uncertainty of referring expressions in 3D indoor environments (3D-REQ). Referring to an object using natural …