Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

Structured world models from human videos

R Mendonca, S Bahl, D Pathak - arXiv preprint arXiv:2308.10901, 2023 - arxiv.org
We tackle the problem of learning complex, general behaviors directly in the real world. We
propose an approach for robots to efficiently learn manipulation skills using only a handful of …

Reinforcement learning with videos: Combining offline observations with interaction

K Schmeckpeper, O Rybkin, K Daniilidis… - arXiv preprint arXiv …, 2020 - arxiv.org
Reinforcement learning is a powerful framework for robots to acquire skills from experience,
but often requires a substantial amount of online data collection. As a result, it is difficult to …

Avid: Learning multi-stage tasks via pixel-level translation of human videos

L Smith, N Dhawan, M Zhang, P Abbeel… - arXiv preprint arXiv …, 2019 - arxiv.org
Robotic reinforcement learning (RL) holds the promise of enabling robots to learn complex
behaviors through experience. However, realizing this promise for long-horizon tasks in the …

Masked world models for visual control

Y Seo, D Hafner, H Liu, F Liu, S James… - … on Robot Learning, 2023 - proceedings.mlr.press
Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient
robot learning from visual observations. Yet the current approaches typically train a single …

Affordance learning from play for sample-efficient policy learning

J Borja-Diaz, O Mees, G Kalweit… - … on Robotics and …, 2022 - ieeexplore.ieee.org
Robots operating in human-centered environments should have the ability to understand
how objects function: what can be done with each object, where this interaction may occur …

Learning generalizable robotic reward functions from" in-the-wild" human videos

AS Chen, S Nair, C Finn - arXiv preprint arXiv:2103.16817, 2021 - arxiv.org
We are motivated by the goal of generalist robots that can complete a wide range of tasks
across many environments. Critical to this is the robot's ability to acquire some metric of task …

Time-contrastive networks: Self-supervised learning from video

P Sermanet, C Lynch, Y Chebotar, J Hsu… - … on robotics and …, 2018 - ieeexplore.ieee.org
We propose a self-supervised approach for learning representations and robotic behaviors
entirely from unlabeled videos recorded from multiple viewpoints, and study how this …

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

R Ramrakhya, E Undersander… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a large-scale study of imitating human demonstrations on tasks that require a
virtual robot to search for objects in new environments-(1) ObjectGoal Navigation (eg'find & …

Videodex: Learning dexterity from internet videos

K Shaw, S Bahl, D Pathak - Conference on Robot Learning, 2023 - proceedings.mlr.press
To build general robotic agents that can operate in many environments, it is often imperative
for the robot to collect experience in the real world. However, this is often not feasible due to …