Octo: An open-source generalist robot policy

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

被引用次数：37 相关文章所有 2 个版本

[PDF] arxiv.org

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arXiv preprint arXiv:2401.02117, 2024 - arxiv.org

Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

被引用次数：212 相关文章所有 3 个版本

[PDF] openreview.net

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - First Workshop on Vision …, 2024 - openreview.net

Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …

被引用次数：50 相关文章所有 2 个版本

[PDF] arxiv.org

Scaling cross-embodied learning: One policy for manipulation, navigation, locomotion and aviation

R Doshi, H Walke, O Mees, S Dasari… - arXiv preprint arXiv …, 2024 - arxiv.org

Modern machine learning systems rely on large datasets to attain broad generalization, and
this often poses a challenge in robot learning, where each robotic platform and task might …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

CL Cheang, G Chen, Y Jing, T Kong, H Li, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable
robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

General-purpose foundation models for increased autonomy in robot-assisted surgery

S Schmidgall, JW Kim, A Kuntz, AE Ghazi… - Nature Machine …, 2024 - nature.com

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific
objectives that solve a single robotic problem such as picking up an object or reaching a …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Surgical robot transformer (srt): Imitation learning for surgical tasks

JW Kim, TZ Zhao, S Schmidgall, A Deguet… - arXiv preprint arXiv …, 2024 - arxiv.org

We explore whether surgical manipulation tasks can be learned on the da Vinci robot via
imitation learning. However, the da Vinci system presents unique challenges which hinder …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

J Wen, Y Zhu, J Li, M Zhu, K Wu, Z Xu, N Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor
control and instruction comprehension through end-to-end learning processes. However …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Spatialbot: Precise spatial understanding with vision language models

W Cai, I Ponomarenko, J Yuan, X Li, W Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision Language Models (VLMs) have achieved impressive performance in 2D image
understanding, however they are still struggling with spatial understanding which is the …

被引用次数：10 相关文章

高级搜索

QQ 群