Rt-2: Vision-language-action models transfer web knowledge to robotic control

A Brohan, N Brown, J Carbajal, Y Chebotar… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

B Zitkovich, T Yu, S Xu, P Xu, T Xiao… - … on Robot Learning, 2023 - proceedings.mlr.press
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

A survey of optimization-based task and motion planning: From classical to learning approaches

Z Zhao, S Cheng, Y Ding, Z Zhou… - IEEE/ASME …, 2024 - ieeexplore.ieee.org
Task and motion planning (TAMP) integrates high-level task planning and low-level motion
planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic …

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org
In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

Doremi: Grounding language model by detecting and recovering from plan-execution misalignment

Y Guo, YJ Wang, L Zha, J Chen - 2024 IEEE/RSJ International …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) encode a vast amount of semantic knowledge and possess
remarkable understanding and reasoning capabilities. Previous work has explored how to …

Integrating action knowledge and LLMs for task planning and situation handling in open worlds

Y Ding, X Zhang, S Amiri, N Cao, H Yang… - Autonomous …, 2023 - Springer
Task planning systems have been developed to help robots use human knowledge (about
actions) to complete long-horizon tasks. Most of them have been developed for “closed …

Object-centric instruction augmentation for robotic manipulation

J Wen, Y Zhu, M Zhu, J Li, Z Xu, Z Che, C Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Humans interpret scenes by recognizing both the identities and positions of objects in their
observations. For a robot to perform tasks such as\enquote {pick and place}, understanding …

ReplanVLM: Replanning robotic tasks with visual language models

A Mei, GN Zhu, H Zhang, Z Gan - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) have gained increasing popularity in robotic task planning
due to their exceptional abilities in text analytics and generation, as well as their broad …

Guiding Long-Horizon Task and Motion Planning with Vision Language Models

Z Yang, C Garrett, D Fox, T Lozano-Pérez… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Models (VLM) can generate plausible high-level plans when prompted
with a goal, the context, an image of the scene, and any planning constraints. However …

GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

A Mei, J Wang, GN Zhu, Z Gan - arXiv preprint arXiv:2405.13751, 2024 - arxiv.org
With their prominent scene understanding and reasoning capabilities, pre-trained visual-
language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task …