This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task …
Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion …
We introduce a novel generative model for video prediction based on latent flow matching, an efficient alternative to diffusion-based models. In contrast to prior work, we keep the high …
D Zhou, W Wang, H Yan, W Lv, Y Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. MagicVideo can generate smooth video clips that are concordant with …
Abstract We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video …
S Yu, K Sohn, S Kim, J Shin - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Despite the remarkable progress in deep generative models, synthesizing high-resolution and temporally coherent videos still remains a challenge due to their high-dimensionality …
We propose Latent-Shift--an efficient text-to-video generation method based on a pretrained text-to-image generation model that consists of an autoencoder and a U-Net diffusion model …
In the deep learning era, long video generation of high-quality still remains challenging due to the spatio-temporal complexity and continuity of videos. Existing prior works have …
B Wu, CY Chuang, X Wang, Y Jia… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we introduce Fairy a minimalist yet robust adaptation of image-editing diffusion models enhancing them for video editing applications. Our approach centers on the concept …