Large language models for robotics: Opportunities, challenges, and perspectives

J Wang, Z Wu, Y Li, H Jiang, P Shu, E Shi, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …

Open x-embodiment: Robotic learning datasets and rt-x models

A Padalkar, A Pooley, A Jain, A Bewley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

C Eze, C Crick - arXiv preprint arXiv:2402.07127, 2024 - arxiv.org
Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased
datasets. While curated datasets can help, challenges remain in generalizability and real …

Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators

P Wu, Y Shentu, Z Yi, X Lin, P Abbeel - arXiv preprint arXiv:2309.13037, 2023 - arxiv.org
Imitation learning from human demonstrations is a powerful framework to teach robots new
skills. However, the performance of the learned policies is bottlenecked by the quality, scale …

Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill

W Cai, S Huang, G Cheng, Y Long, P Gao… - arXiv preprint arXiv …, 2023 - arxiv.org
Zero-shot object navigation is a challenging task for home-assistance robots. This task
emphasizes visual grounding, commonsense inference and locomotion abilities, where the …

3d-vla: A 3d vision-language-action generative world model

H Zhen, X Qiu, P Chen, J Yang, X Yan, Y Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the
broader realm of the 3D physical world. Furthermore, they perform action prediction by …

Robot fine-tuning made easy: Pre-training rewards and policies for autonomous real-world reinforcement learning

J Yang, MS Mark, B Vu, A Sharma, J Bohg… - arXiv preprint arXiv …, 2023 - arxiv.org
The pre-train and fine-tune paradigm in machine learning has had dramatic success in a
wide range of domains because the use of existing data or pre-trained models on the …

Large-scale actionless video pre-training via discrete diffusion for efficient policy learning

H He, C Bai, L Pan, W Zhang, B Zhao, X Li - arXiv preprint arXiv …, 2024 - arxiv.org
Learning a generalist embodied agent capable of completing multiple tasks poses
challenges, primarily stemming from the scarcity of action-labeled robotic datasets. In …

Score models for offline goal-conditioned reinforcement learning

H Sikchi, R Chitnis, A Touati, A Geramifard… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve
multiple goals in an environment purely from offline datasets using sparse reward functions …

Pre-trained text-to-image diffusion models are versatile representation learners for control

G Gupta, K Yadav, Y Gal, D Batra, Z Kira, C Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Embodied AI agents require a fine-grained understanding of the physical world mediated
through visual and language inputs. Such capabilities are difficult to learn solely from task …