Simda: Simple diffusion adapter for efficient video generation

Z Xing, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Lavie: High-quality video generation with cascaded latent diffusion models

Y Wang, X Chen, X Ma, S Zhou, Z Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a
pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …

Video diffusion models

J Ho, T Salimans, A Gritsenko… - Advances in …, 2022 - proceedings.neurips.cc
Generating temporally coherent high fidelity video is an important milestone in generative
modeling research. We make progress towards this milestone by proposing a diffusion …

Efficient video prediction via sparsely conditioned flow matching

A Davtyan, S Sameni, P Favaro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We introduce a novel generative model for video prediction based on latent flow matching,
an efficient alternative to diffusion-based models. In contrast to prior work, we keep the high …

Magicvideo: Efficient video generation with latent diffusion models

D Zhou, W Wang, H Yan, W Lv, Y Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org
We present an efficient text-to-video generation framework based on latent diffusion models,
termed MagicVideo. MagicVideo can generate smooth video clips that are concordant with …

Magvit: Masked generative video transformer

L Yu, Y Cheng, K Sohn, J Lezama… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various
video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …

Video probabilistic diffusion models in projected latent space

S Yu, K Sohn, S Kim, J Shin - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Despite the remarkable progress in deep generative models, synthesizing high-resolution
and temporally coherent videos still remains a challenge due to their high-dimensionality …

Latent-shift: Latent diffusion with temporal shift for efficient text-to-video generation

J An, S Zhang, H Yang, S Gupta, JB Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose Latent-Shift--an efficient text-to-video generation method based on a pretrained
text-to-image generation model that consists of an autoencoder and a U-Net diffusion model …

Generating videos with dynamics-aware implicit generative adversarial networks

S Yu, J Tack, S Mo, H Kim, J Kim, JW Ha… - arXiv preprint arXiv …, 2022 - arxiv.org
In the deep learning era, long video generation of high-quality still remains challenging due
to the spatio-temporal complexity and continuity of videos. Existing prior works have …

Fairy: Fast parallelized instruction-guided video-to-video synthesis

B Wu, CY Chuang, X Wang, Y Jia… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we introduce Fairy a minimalist yet robust adaptation of image-editing diffusion
models enhancing them for video editing applications. Our approach centers on the concept …