Phenaki: Variable length video generation from open domain textual descriptions

R Villegas, M Babaeizadeh, PJ Kindermans… - International …, 2022 - openreview.net
We present Phenaki, a model capable of realistic video synthesis given a sequence of
textual prompts. Generating videos from text is particularly challenging due to the …

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc
Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

Mcvd-masked conditional video diffusion for prediction, generation, and interpolation

V Voleti, A Jolicoeur-Martineau… - Advances in neural …, 2022 - proceedings.neurips.cc
Video prediction is a challenging task. The quality of video frames from current state-of-the-
art (SOTA) generative models tends to be poor and generalization beyond the training data …

Flexible diffusion modeling of long videos

W Harvey, S Naderiparizi, V Masrani… - Advances in …, 2022 - proceedings.neurips.cc
We present a framework for video modeling based on denoising diffusion probabilistic
models that produces long-duration video completions in a variety of realistic environments …

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

Diffusion probabilistic modeling for video generation

R Yang, P Srivastava, S Mandt - Entropy, 2023 - mdpi.com
Denoising diffusion probabilistic models are a promising new class of generative models
that mark a milestone in high-quality image generation. This paper showcases their ability to …

Gaia-1: A generative world model for autonomous driving

A Hu, L Russell, H Yeo, Z Murez, G Fedoseev… - arXiv preprint arXiv …, 2023 - arxiv.org
Autonomous driving promises transformative improvements to transportation, but building
systems capable of safely navigating the unstructured complexity of real-world scenarios …

Maskvit: Masked visual pre-training for video prediction

A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín… - arXiv preprint arXiv …, 2022 - arxiv.org
The ability to predict future visual observations conditioned on past observations and motor
commands can enable embodied agents to plan solutions to a variety of tasks in complex …

Reinforcement learning with action-free pre-training from videos

Y Seo, K Lee, SL James… - … Conference on Machine …, 2022 - proceedings.mlr.press
Recent unsupervised pre-training methods have shown to be effective on language and
vision domains by learning useful representations for multiple downstream tasks. In this …

Dall-e-bot: Introducing web-scale diffusion models to robotics

I Kapelyukh, V Vosylius, E Johns - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot
enables a robot to rearrange objects in a scene, by first inferring a text description of those …