Perceiver-actor: A multi-task transformer for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press
Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

Inner monologue: Embodied reasoning through planning with language models

W Huang, F Xia, T Xiao, H Chan, J Liang… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

Scaling up and distilling down: Language-guided robot skill acquisition

H Ha, P Florence, S Song - Conference on Robot Learning, 2023 - proceedings.mlr.press
We present a framework for robot skill acquisition, which 1) efficiently scale up data
generation of language-labelled robot data and 2) effectively distills this data down into a …

Socratic models: Composing zero-shot multimodal reasoning with language

A Zeng, M Attarian, B Ichter, K Choromanski… - arXiv preprint arXiv …, 2022 - arxiv.org
Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …

Diffusion policy: Visuomotor policy learning via action diffusion

C Chi, S Feng, Y Du, Z Xu, E Cousineau… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces Diffusion Policy, a new way of generating robot behavior by
representing a robot's visuomotor policy as a conditional denoising diffusion process. We …

Real-world robot learning with masked visual pre-training

I Radosavovic, T Xiao, S James… - … on Robot Learning, 2023 - proceedings.mlr.press
In this work, we explore self-supervised visual pre-training on images from diverse, in-the-
wild videos for real-world robotic tasks. Like prior work, our visual representations are pre …

Learning fine-grained bimanual manipulation with low-cost hardware

TZ Zhao, V Kumar, S Levine, C Finn - arXiv preprint arXiv:2304.13705, 2023 - arxiv.org
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously
difficult for robots because they require precision, careful coordination of contact forces, and …

Cliport: What and where pathways for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on robot learning, 2022 - proceedings.mlr.press
How can we imbue robots with the ability to manipulate objects precisely but also to reason
about them in terms of abstract concepts? Recent works in manipulation have shown that …

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

Behavior Transformers: Cloning modes with one stone

NM Shafiullah, Z Cui… - Advances in neural …, 2022 - proceedings.neurips.cc
While behavior learning has made impressive progress in recent times, it lags behind
computer vision and natural language processing due to its inability to leverage large …