A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Rt-1: Robotics transformer for real-world control at scale

A Brohan, N Brown, J Carbajal, Y Chebotar… - arXiv preprint arXiv …, 2022 - arxiv.org
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …

Open x-embodiment: Robotic learning datasets and rt-x models

A O'Neill, A Rehman, A Gupta, A Maddukuri… - arXiv preprint arXiv …, 2023 - arxiv.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org
Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0

A O'Neill, A Rehman, A Maddukuri… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Language to rewards for robotic skill synthesis

W Yu, N Gileadi, C Fu, S Kirmani, KH Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated exciting progress in acquiring diverse
new capabilities through in-context learning, ranging from logical reasoning to code-writing …

Learning universal policies via text-guided video generation

Y Du, S Yang, B Dai, H Dai… - Advances in …, 2024 - proceedings.neurips.cc
A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks.
Recent progress in text-guided image synthesis has yielded models with an impressive …

Learning fine-grained bimanual manipulation with low-cost hardware

TZ Zhao, V Kumar, S Levine, C Finn - arXiv preprint arXiv:2304.13705, 2023 - arxiv.org
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously
difficult for robots because they require precision, careful coordination of contact forces, and …

R3m: A universal visual representation for robot manipulation

S Nair, A Rajeswaran, V Kumar, C Finn… - arXiv preprint arXiv …, 2022 - arxiv.org
We study how visual representations pre-trained on diverse human video data can enable
data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a …