Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org
Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

Large language models for robotics: Opportunities, challenges, and perspectives

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

The foundation model transparency index

R Bommasani, K Klyman, S Longpre, S Kapoor… - arXiv preprint arXiv …, 2023 - arxiv.org
Foundation models have rapidly permeated society, catalyzing a wave of generative AI
applications spanning enterprise and consumer-facing contexts. While the societal impact of …

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arXiv preprint arXiv:2401.02117, 2024 - arxiv.org
Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Rt-h: Action hierarchies using language

S Belkhale, T Ding, T Xiao, P Sermanet… - arXiv preprint arXiv …, 2024 - arxiv.org
Language provides a way to break down complex concepts into digestible pieces. Recent
works in robot imitation learning use language-conditioned policies that predict actions …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian, A Majumdar, J Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Droid: A large-scale in-the-wild robot manipulation dataset

A Khazatsky, K Pertsch, S Nair, A Balakrishna… - arXiv preprint arXiv …, 2024 - arxiv.org
The creation of large, diverse, high-quality robot manipulation datasets is an important
stepping stone on the path toward more capable and robust robotic manipulation policies …