相关文章- 学术资源搜索

Closed-loop open-vocabulary mobile manipulation with gpt-4v

P Zhi, Z Zhang, M Han, Z Zhang, Z Li, Z Jiao… - arXiv preprint arXiv …, 2024 - arxiv.org

Autonomous robot navigation and manipulation in open environments require reasoning
and replanning with closed-loop feedback. We present COME-robot, the first closed-loop …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

C Jin, W Tan, J Yang, B Liu, R Song, L Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose a novel framework for learning high-level cognitive capabilities in robot
manipulation tasks, such as making a smiley face using building blocks. These tasks often …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

GPT-4V (ision) for robotics: Multimodal task planning from human demonstration

N Wake, A Kanehira, K Sasabuchi, J Takamatsu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V
(ision), by integrating observations of human actions to facilitate robotic manipulation. This …

被引用次数：19 相关文章所有 2 个版本

[PDF] aaai.org

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

被引用次数：50 相关文章所有 2 个版本

[PDF] arxiv.org

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org

In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

被引用次数：210 相关文章所有 5 个版本

[PDF] arxiv.org

Language-grounded dynamic scene graphs for interactive object search with mobile manipulation

D Honerkamp, M Buchner, F Despinoy… - arXiv preprint arXiv …, 2024 - arxiv.org

To fully leverage the capabilities of mobile manipulation robots, it is imperative that they are
able to autonomously execute long-horizon tasks in large unexplored environments. While …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Homerobot: Open-vocabulary mobile manipulation

S Yenamandra, A Ramachandran, K Yadav… - arXiv preprint arXiv …, 2023 - arxiv.org

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a
wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - arXiv preprint arXiv …, 2024 - arxiv.org

Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking
and decision-making abilities across various tasks. However, existing zero-shot agents for …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - arXiv preprint arXiv:2403.03174, 2024 - arxiv.org

Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群

Closed-loop open-vocabulary mobile manipulation with gpt-4v

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

GPT-4V (ision) for robotics: Multimodal task planning from human demonstration

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Voxposer: Composable 3d value maps for robotic manipulation with language models

Language-grounded dynamic scene graphs for interactive object search with mobile manipulation

Homerobot: Open-vocabulary mobile manipulation

MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

引用