This monograph surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches …
Abstract Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower …
JZ Wu, Y Ge, X Wang, SW Lei, Y Gu… - Proceedings of the …, 2023 - openaccess.thecvf.com
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such …
L Khachatryan, A Movsisyan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to …
C Qi, X Cun, Y Zhang, C Lei, X Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation …
S Ge, S Nah, G Liu, T Poon, A Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally …
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of- the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
Image diffusion models, trained on massive image collections, have emerged as the most versatile image generator model in terms of quality and diversity. They support inverting real …
Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where …