R3m: A universal visual representation for robot manipulation

S Nair, A Rajeswaran, V Kumar, C Finn… - arXiv preprint arXiv …, 2022 - arxiv.org
We study how visual representations pre-trained on diverse human video data can enable
data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

A survey of zero-shot generalisation in deep reinforcement learning

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023 - jair.org
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

Vip: Towards universal visual reward and representation via value-implicit pre-training

YJ Ma, S Sodhani, D Jayaraman, O Bastani… - arXiv preprint arXiv …, 2022 - arxiv.org
Reward and representation learning are two long-standing challenges for learning an
expanding set of robot manipulation skills from sensory observations. Given the inherent …

Open x-embodiment: Robotic learning datasets and rt-x models

A Padalkar, A Pooley, A Jain, A Bewley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Reinforcement learning with action-free pre-training from videos

Y Seo, K Lee, SL James… - … Conference on Machine …, 2022 - proceedings.mlr.press
Recent unsupervised pre-training methods have shown to be effective on language and
vision domains by learning useful representations for multiple downstream tasks. In this …

Mimicplay: Long-horizon imitation learning by watching human play

C Wang, L Fan, J Sun, R Zhang, L Fei-Fei, D Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Imitation learning from human demonstrations is a promising paradigm for teaching robots
manipulation skills in the real world. However, learning complex long-horizon tasks often …

Dexmv: Imitation learning for dexterous manipulation from human videos

Y Qin, YH Wu, S Liu, H Jiang, R Yang, Y Fu… - European Conference on …, 2022 - Springer
While significant progress has been made on understanding hand-object interactions in
computer vision, it is still very challenging for robots to perform complex dexterous …

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Reinforcement learning from passive data via latent intentions

D Ghosh, CA Bhateja, S Levine - … Conference on Machine …, 2023 - proceedings.mlr.press
Passive observational data, such as human videos, is abundant and rich in information, yet
remains largely untapped by current RL methods. Perhaps surprisingly, we show that …