相关文章- 学术资源搜索

Gpt-4v (ision) for robotics: Multimodal task planning from human demonstration

N Wake, A Kanehira, K Sasabuchi, J Takamatsu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V
(ision), by integrating observations of human actions to facilitate robotic manipulation. This …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

C Jin, W Tan, J Yang, B Liu, R Song, L Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose a novel framework for learning high-level cognitive capabilities in robot
manipulation tasks, such as making a smiley face using building blocks. These tasks often …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - arXiv preprint arXiv:2403.03174, 2024 - arxiv.org

Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …

被引用次数：8 相关文章所有 3 个版本

[PDF] openreview.net

Gesture-informed robot assistance via foundation models

LH Lin, Y Cui, Y Hao, F Xia, D Sadigh - 7th Annual Conference on …, 2023 - openreview.net

Gestures serve as a fundamental and significant mode of non-verbal communication among
humans. Deictic gestures (such as pointing towards an object), in particular, offer valuable …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Y Hu, F Lin, T Zhang, L Yi, Y Gao - arXiv preprint arXiv:2311.17842, 2023 - arxiv.org

In this study, we are interested in imbuing robots with the capability of physically-grounded
task planning. Recent advancements have shown that large language models (LLMs) …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Open-world object manipulation using pre-trained vision-language models

A Stone, T Xiao, Y Lu, K Gopalakrishnan… - arXiv preprint arXiv …, 2023 - arxiv.org

For robots to follow instructions from people, they must be able to connect the rich semantic
information in human vocabulary, eg" can you get me the pink stuffed whale?" to their …

被引用次数：80 相关文章所有 4 个版本

[PDF] arxiv.org

Embodied task planning with large language models

Z Wu, Z Wang, X Xu, J Lu, H Yan - arXiv preprint arXiv:2307.01848, 2023 - arxiv.org

Equipping embodied agents with commonsense is important for robots to successfully
complete complex human instructions in general environments. Recent large language …

被引用次数：42 相关文章所有 2 个版本

Physically grounded vision-language models for robotic manipulation

J Gao, B Sarkar, F Xia, T Xiao, J Wu, B Ichter… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …

被引用次数：39 相关文章所有 2 个版本

[PDF] arxiv.org

Structured world models from human videos

R Mendonca, S Bahl, D Pathak - arXiv preprint arXiv:2308.10901, 2023 - arxiv.org

We tackle the problem of learning complex, general behaviors directly in the real world. We
propose an approach for robots to efficiently learn manipulation skills using only a handful of …

被引用次数：37 相关文章所有 5 个版本

[PDF] arxiv.org

Assistive tele-op: Leveraging transformers to collect robotic task demonstrations

HM Clever, A Handa, H Mazhar, K Parker… - arXiv preprint arXiv …, 2021 - arxiv.org

Sharing autonomy between robots and human operators could facilitate data collection of
robotic task demonstrations to continuously improve learned models. Yet, the means to …

被引用次数：19 相关文章所有 2 个版本

高级搜索

QQ 群

Gpt-4v (ision) for robotics: Multimodal task planning from human demonstration

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

Gesture-informed robot assistance via foundation models

Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning

Open-world object manipulation using pre-trained vision-language models

Embodied task planning with large language models

Physically grounded vision-language models for robotic manipulation

Structured world models from human videos

Assistive tele-op: Leveraging transformers to collect robotic task demonstrations

相关搜索

引用