查看文章

openreview.net 中的 [PDF]

Transcribe3d: Grounding llms using transcribed information for 3d referential reasoning with self-corrected finetuning

作者

Jiading Fang, Xiangshan Tan, Shengjie Lin, Hongyuan Mei, Matthew Walter

发表日期

2023/10/21

研讨会论文

2nd Workshop on Language and Robot Learning: Language as Grounding

简介

If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring expressions is challenging---it requires the ability to both parse the 3D structure of the scene as well as to correctly ground free-form language in the presence of distraction and clutter. We propose Transcribe3D, a simple yet effective approach to interpreting 3D referring expressions, which converts 3D scene geometry into a textual representation and takes advantage of the common sense reasoning capability of large language models (LLMs) to make inferences about the objects in the scene and their interactions. We experimentally demonstrate that employing LLMs in this zero-shot fashion outperforms contemporary methods. We then improve upon the zero-shot version of \acronym by performing finetuning from self-correction in order to generalize. We show preliminary results on the Referit3D dataset with state-of-the-art performance. We also show that our method enables real robots to perform pick-and-place tasks given queries that contain challenging referring expressions.

引用总数

被引用次数：2

20242

学术搜索中的文章

Transcribe3d: Grounding llms using transcribed information for 3d referential reasoning with self-corrected finetuning

J Fang, X Tan, S Lin, H Mei, M Walter - 2nd Workshop on Language and Robot Learning …, 2023

被引用次数：2 相关文章所有 2 个版本