Large language models for robotics: Opportunities, challenges, and perspectives

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arXiv preprint arXiv:2401.02117, 2024 - arxiv.org
Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org
Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

The foundation model transparency index

R Bommasani, K Klyman, S Longpre, S Kapoor… - arXiv preprint arXiv …, 2023 - arxiv.org
Foundation models have rapidly permeated society, catalyzing a wave of generative AI
applications spanning enterprise and consumer-facing contexts. While the societal impact of …

Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian, A Majumdar, J Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Vision-language foundation models as effective robot imitators

X Li, M Liu, H Zhang, C Yu, J Xu, H Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent progress in vision language foundation models has shown their ability to understand
multimodal data and resolve complicated vision language tasks, including robotics …

On bringing robots home

NMM Shafiullah, A Rai, H Etukuru, Y Liu, I Misra… - arXiv preprint arXiv …, 2023 - arxiv.org
Throughout history, we have successfully integrated various machines into our homes.
Dishwashers, laundry machines, stand mixers, and robot vacuums are a few recent …