Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

Zero-shot robot manipulation from passive human videos

H Bharadhwaj, A Gupta, S Tulsiani, V Kumar - arXiv preprint arXiv …, 2023 - arxiv.org
Can we learn robot manipulation for everyday tasks, only by watching videos of humans
doing arbitrary tasks in different unstructured settings? Unlike widely adopted strategies of …

Unleashing large-scale video generative pre-training for visual robot manipulation

H Wu, Y Jing, C Cheang, G Chen, J Xu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …

Genaug: Retargeting behaviors to unseen situations via generative augmentation

Z Chen, S Kiami, A Gupta, V Kumar - arXiv preprint arXiv:2302.06671, 2023 - arxiv.org
Robot learning methods have the potential for widespread generalization across tasks,
environments, and objects. However, these methods require large diverse datasets that are …

Open-world object manipulation using pre-trained vision-language models

A Stone, T Xiao, Y Lu, K Gopalakrishnan… - arXiv preprint arXiv …, 2023 - arxiv.org
For robots to follow instructions from people, they must be able to connect the rich semantic
information in human vocabulary, eg" can you get me the pink stuffed whale?" to their …

Learning to see before learning to act: Visual pre-training for manipulation

L Yen-Chen, A Zeng, S Song, P Isola… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Does having visual priors (eg the ability to detect objects) facilitate learning to perform vision-
based manipulation (eg picking up objects)? We study this problem under the framework of …

What can i do here? learning new skills by imagining visual affordances

A Khazatsky, A Nair, D Jing… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
A generalist robot equipped with learned skills must be able to perform many tasks in many
different environments. However, zero-shot generalization to new settings is not always …

Secant: Self-expert cloning for zero-shot generalization of visual policies

L Fan, G Wang, DA Huang, Z Yu, L Fei-Fei… - arXiv preprint arXiv …, 2021 - arxiv.org
Generalization has been a long-standing challenge for reinforcement learning (RL). Visual
RL, in particular, can be easily distracted by irrelevant factors in high-dimensional …

A latent space of stochastic diffusion models for zero-shot image editing and guidance

CH Wu, F De la Torre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Diffusion models generate images by iterative denoising. Recent work has shown that by
making the denoising process deterministic, one can encode real images into latent codes …

Sequential dexterity: Chaining dexterous policies for long-horizon manipulation

Y Chen, C Wang, L Fei-Fei, CK Liu - arXiv preprint arXiv:2309.00987, 2023 - arxiv.org
Many real-world manipulation tasks consist of a series of subtasks that are significantly
different from one another. Such long-horizon, complex tasks highlight the potential of …