Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation

C Eze, C Crick - arXiv preprint arXiv:2402.07127, 2024 - arxiv.org
Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased
datasets. While curated datasets can help, challenges remain in generalizability and real …

Robonet: Large-scale multi-robot learning

S Dasari, F Ebert, S Tian, S Nair, B Bucher… - arXiv preprint arXiv …, 2019 - arxiv.org
Robot learning has emerged as a promising tool for taming the complexity and diversity of
the real world. Methods based on high-capacity models, such as deep networks, hold the …

R3m: A universal visual representation for robot manipulation

S Nair, A Rajeswaran, V Kumar, C Finn… - arXiv preprint arXiv …, 2022 - arxiv.org
We study how visual representations pre-trained on diverse human video data can enable
data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a …

Giving robots a hand: Learning generalizable manipulation with eye-in-hand human video demonstrations

MJ Kim, J Wu, C Finn - arXiv preprint arXiv:2307.05959, 2023 - arxiv.org
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and
generalization in vision-based robotic manipulation. However, for robotic imitation, it is still …

Videodex: Learning dexterity from internet videos

K Shaw, S Bahl, D Pathak - Conference on Robot Learning, 2023 - proceedings.mlr.press
To build general robotic agents that can operate in many environments, it is often imperative
for the robot to collect experience in the real world. However, this is often not feasible due to …

Learning video-conditioned policies for unseen manipulation tasks

E Chane-Sane, C Schmid… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
The ability to specify robot commands by a non-expert user is critical for building generalist
agents capable of solving a large variety of tasks. One convenient way to specify the …

Vima: Robot manipulation with multimodal prompts

Y Jiang, A Gupta, Z Zhang, G Wang, Y Dou, Y Chen… - 2023 - openreview.net
Prompt-based learning has emerged as a successful paradigm in natural language
processing, where a single general-purpose language model can be instructed to perform …

LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

D Niu, Y Sharma, G Biamby, J Quenum, Y Bai… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, instruction-tuned Large Multimodal Models (LMMs) have been successful at
several tasks, including image captioning and visual question answering; yet leveraging …

Adversarial skill networks: Unsupervised robot skill learning from video

O Mees, M Merklinger, G Kalweit… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Key challenges for the deployment of reinforcement learning (RL) agents in the real world
are the discovery, representation and reuse of skills in the absence of a reward function. To …

Learning Manipulation by Predicting Interaction

J Zeng, Q Bu, B Wang, W Xia, L Chen, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Representation learning approaches for robotic manipulation have boomed in recent years.
Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large …