- 学术资源搜索

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org

Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

被引用次数：32 相关文章

[PDF] arxiv.org

Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org

If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models for robotics: Opportunities, challenges, and perspectives

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

The foundation model transparency index

R Bommasani, K Klyman, S Longpre, S Kapoor… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models have rapidly permeated society, catalyzing a wave of generative AI
applications spanning enterprise and consumer-facing contexts. While the societal impact of …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arXiv preprint arXiv:2401.02117, 2024 - arxiv.org

Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Rt-h: Action hierarchies using language

S Belkhale, T Ding, T Xiao, P Sermanet… - arXiv preprint arXiv …, 2024 - arxiv.org

Language provides a way to break down complex concepts into digestible pieces. Recent
works in robot imitation learning use language-conditioned policies that predict actions …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian, A Majumdar, J Sun… - arXiv preprint arXiv …, 2023 - arxiv.org

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Droid: A large-scale in-the-wild robot manipulation dataset

A Khazatsky, K Pertsch, S Nair, A Balakrishna… - arXiv preprint arXiv …, 2024 - arxiv.org

The creation of large, diverse, high-quality robot manipulation datasets is an important
stepping stone on the path toward more capable and robust robotic manipulation policies …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

Octo: An open-source generalist robot policy

Zero-shot robotic manipulation with pretrained image-editing diffusion models

Large language models for robotics: Opportunities, challenges, and perspectives

The foundation model transparency index

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Robot learning in the era of foundation models: A survey

Rt-h: Action hierarchies using language

Foundation models in robotics: Applications, challenges, and the future

Droid: A large-scale in-the-wild robot manipulation dataset

引用