Zero-shot robotic manipulation with pretrained image-editing diffusion models

K Black, M Nakamoto, P Atreya, H Walke… - arXiv preprint arXiv …, 2023 - arxiv.org
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …

Zm-net: Real-time zero-shot image manipulation network

H Wang, X Liang, H Zhang, DY Yeung… - arXiv preprint arXiv …, 2017 - arxiv.org
Many problems in image processing and computer vision (eg colorization, style transfer) can
be posed as' manipulating'an input image into a corresponding output image given a user …

Open-world object manipulation using pre-trained vision-language models

A Stone, T Xiao, Y Lu, K Gopalakrishnan… - arXiv preprint arXiv …, 2023 - arxiv.org
For robots to follow instructions from people, they must be able to connect the rich semantic
information in human vocabulary, eg" can you get me the pink stuffed whale?" to their …

Genaug: Retargeting behaviors to unseen situations via generative augmentation

Z Chen, S Kiami, A Gupta, V Kumar - arXiv preprint arXiv:2302.06671, 2023 - arxiv.org
Robot learning methods have the potential for widespread generalization across tasks,
environments, and objects. However, these methods require large diverse datasets that are …

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

H Wu, Y Jing, C Cheang, G Chen, J Xu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …

Zero-shot robot manipulation from passive human videos

H Bharadhwaj, A Gupta, S Tulsiani, V Kumar - arXiv preprint arXiv …, 2023 - arxiv.org
Can we learn robot manipulation for everyday tasks, only by watching videos of humans
doing arbitrary tasks in different unstructured settings? Unlike widely adopted strategies of …

Learning to see before learning to act: Visual pre-training for manipulation

L Yen-Chen, A Zeng, S Song, P Isola… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Does having visual priors (eg the ability to detect objects) facilitate learning to perform vision-
based manipulation (eg picking up objects)? We study this problem under the framework of …

A latent space of stochastic diffusion models for zero-shot image editing and guidance

CH Wu, F De la Torre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Diffusion models generate images by iterative denoising. Recent work has shown that by
making the denoising process deterministic, one can encode real images into latent codes …

Deltaedit: Exploring text-free training for text-driven image manipulation

Y Lyu, T Lin, F Li, D He, J Dong… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Text-driven image manipulation remains challenging in training or inference flexibility.
Conditional generative models depend heavily on expensive annotated training data …

Secant: Self-expert cloning for zero-shot generalization of visual policies

L Fan, G Wang, DA Huang, Z Yu, L Fei-Fei… - arXiv preprint arXiv …, 2021 - arxiv.org
Generalization has been a long-standing challenge for reinforcement learning (RL). Visual
RL, in particular, can be easily distracted by irrelevant factors in high-dimensional …