We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks. These tasks often …
Open-vocabulary generalization requires robotic systems to perform tasks involving complex and diverse environments and task goals. While the recent advances in vision language …
Gestures serve as a fundamental and significant mode of non-verbal communication among humans. Deictic gestures (such as pointing towards an object), in particular, offer valuable …
In this study, we are interested in imbuing robots with the capability of physically-grounded task planning. Recent advancements have shown that large language models (LLMs) …
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary, eg" can you get me the pink stuffed whale?" to their …
Z Wu, Z Wang, X Xu, J Lu, H Yan - arXiv preprint arXiv:2307.01848, 2023 - arxiv.org
Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments. Recent large language …
Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models …
We tackle the problem of learning complex, general behaviors directly in the real world. We propose an approach for robots to efficiently learn manipulation skills using only a handful of …
Sharing autonomy between robots and human operators could facilitate data collection of robotic task demonstrations to continuously improve learned models. Yet, the means to …