相关文章- 学术资源搜索

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

C Jin, W Tan, J Yang, B Liu, R Song, L Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose a novel framework for learning high-level cognitive capabilities in robot
manipulation tasks, such as making a smiley face using building blocks. These tasks often …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Mastering robot manipulation with multimodal prompts through pretraining and multi-task fine-tuning

J Li, Q Gao, M Johnston, X Gao, X He… - arXiv preprint arXiv …, 2023 - arxiv.org

Prompt-based learning has been demonstrated as a compelling paradigm contributing to
large language models' tremendous success (LLMs). Inspired by their success in language …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Closed-loop open-vocabulary mobile manipulation with gpt-4v

P Zhi, Z Zhang, M Han, Z Zhang, Z Li, Z Jiao… - arXiv preprint arXiv …, 2024 - arxiv.org

Autonomous robot navigation and manipulation in open environments require reasoning
and replanning with closed-loop feedback. We present COME-robot, the first closed-loop …

被引用次数：3 相关文章所有 2 个版本

Physically grounded vision-language models for robotic manipulation

J Gao, B Sarkar, F Xia, T Xiao, J Wu, B Ichter… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …

被引用次数：39 相关文章所有 2 个版本

[PDF] arxiv.org

Naturalvlm: Leveraging fine-grained natural language for affordance-guided visual manipulation

R Xu, Y Shen, X Li, R Wu, H Dong - arXiv preprint arXiv:2403.08355, 2024 - arxiv.org

Enabling home-assistant robots to perceive and manipulate a diverse range of 3D objects
based on human language instructions is a pivotal challenge. Prior research has …

被引用次数：2 相关文章所有 3 个版本

[PDF] thecvf.com

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

F Ni, J Hao, S Wu, L Kou, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Robotics agents often struggle to understand and follow the multi-modal prompts in complex
manipulation scenes which are challenging to be sufficiently and accurately described by …

[PDF] arxiv.org

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

M Zhu, Y Zhu, J Li, J Wen, Z Xu, Z Che, C Shen… - arXiv preprint arXiv …, 2024 - arxiv.org

The language-conditioned robotic manipulation aims to transfer natural language
instructions into executable actions, from simple pick-and-place to tasks requiring intent …

被引用次数：5 相关文章所有 2 个版本

[PDF] openreview.net

Learning neuro-symbolic programs for language guided robot manipulation

K Namasivayam, H Singh, V Bindal… - … on Robotics and …, 2023 - ieeexplore.ieee.org

Given a natural language instruction and an input scene, our goal is to train a model to
output a manipulation program that can be executed by the robot. Prior approaches for this …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Spatial-language attention policies for efficient robot learning

P Parashar, V Jain, X Zhang, J Vakil, S Powers… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite great strides in language-guided manipulation, existing work has been constrained
to table-top settings. Table-tops allow for perfect and consistent camera angles, properties …

被引用次数：7 相关文章所有 2 个版本

[PDF] openreview.net

Vima: Robot manipulation with multimodal prompts

Y Jiang, A Gupta, Z Zhang, G Wang, Y Dou, Y Chen… - 2023 - openreview.net

Prompt-based learning has emerged as a successful paradigm in natural language
processing, where a single general-purpose language model can be instructed to perform …

被引用次数：70 相关文章所有 3 个版本

高级搜索

QQ 群

Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation

Mastering robot manipulation with multimodal prompts through pretraining and multi-task fine-tuning

Closed-loop open-vocabulary mobile manipulation with gpt-4v

Physically grounded vision-language models for robotic manipulation

Naturalvlm: Leveraging fine-grained natural language for affordance-guided visual manipulation

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

Language-Conditioned Robotic Manipulation with Fast and Slow Thinking

Learning neuro-symbolic programs for language guided robot manipulation

Spatial-language attention policies for efficient robot learning

Vima: Robot manipulation with multimodal prompts

相关搜索

引用