Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned …
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower …
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition …
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of- the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
We present Phenaki, a model capable of realistic video synthesis given a sequence of textual prompts. Generating videos from text is particularly challenging due to the …
Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion …
Video prediction is a challenging task. The quality of video frames from current state-of-the- art (SOTA) generative models tends to be poor and generalization beyond the training data …
We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments …
Z Gao, C Tan, L Wu, SZ Li - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
Abstract From CNN, RNN, to ViT, we have witnessed remarkable advancements in video prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated …