Closed-loop open-vocabulary mobile manipulation with gpt-4v

P Zhi, Z Zhang, M Han, Z Zhang, Z Li, Z Jiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous robot navigation and manipulation in open environments require reasoning
and replanning with closed-loop feedback. We present COME-robot, the first closed-loop …

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

C Jin, W Tan, J Yang, B Liu, R Song, L Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose a novel framework for learning high-level cognitive capabilities in robot
manipulation tasks, such as making a smiley face using building blocks. These tasks often …

GPT-4V (ision) for robotics: Multimodal task planning from human demonstration

N Wake, A Kanehira, K Sasabuchi, J Takamatsu… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V
(ision), by integrating observations of human actions to facilitate robotic manipulation. This …

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org
In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

Language-grounded dynamic scene graphs for interactive object search with mobile manipulation

D Honerkamp, M Buchner, F Despinoy… - arXiv preprint arXiv …, 2024 - arxiv.org
To fully leverage the capabilities of mobile manipulation robots, it is imperative that they are
able to autonomously execute long-horizon tasks in large unexplored environments. While …

Homerobot: Open-vocabulary mobile manipulation

S Yenamandra, A Ramachandran, K Yadav… - arXiv preprint arXiv …, 2023 - arxiv.org
HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a
wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile …

MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking
and decision-making abilities across various tasks. However, existing zero-shot agents for …

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - arXiv preprint arXiv:2403.03174, 2024 - arxiv.org
Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …