A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Show-1: Marrying pixel and latent diffusion models for text-to-video generation

DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu… - International Journal of …, 2024 - Springer
Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …

Motionctrl: A unified and flexible motion controller for video generation

Z Wang, Z Yuan, X Wang, Y Li, T Chen, M Xia… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Motions in a video primarily consist of camera motion, induced by camera movement, and
object motion, resulting from object movement. Accurate control of both camera and object …

Autodecoding latent 3d diffusion models

E Ntavelis, A Siarohin, K Olszewski… - Advances in …, 2023 - proceedings.neurips.cc
Diffusion-based methods have shown impressive visual results in the text-to-image domain.
They first learn a latent space using an autoencoder, then run a denoising process on the …

Make pixels dance: High-dynamic video generation

Y Zeng, G Wei, J Zheng, J Zou, Y Wei… - Proceedings of the …, 2024 - openaccess.thecvf.com
Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects
poses a significant challenge in the field of artificial intelligence. Unfortunately current state …

Seine: Short-to-long video diffusion model for generative transition and prediction

X Chen, Y Wang, L Zhang, S Zhuang, X Ma… - The Twelfth …, 2023 - openreview.net
Recently video generation has achieved substantial progress with realistic results.
Nevertheless, existing AI-generated videos are usually very short clips (" shot-level'') …

Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

Livephoto: Real image animation with text-guided motion control

X Chen, Z Liu, M Chen, Y Feng, Y Liu, Y Shen… - … on Computer Vision, 2025 - Springer
Despite the recent progress in text-to-video generation, existing studies usually overlook the
issue that only spatial contents but not temporal motions in synthesized videos are under the …

Snap video: Scaled spatiotemporal transformers for text-to-video synthesis

W Menapace, A Siarohin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contemporary models for generating images show remarkable quality and versatility.
Swayed by these advantages the research community repurposes them to generate videos …

Vlogger: Make your dream a vlog

S Zhuang, K Li, X Chen, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we present Vlogger a generic AI system for generating a minute-level video blog
(ie vlog) of user descriptions. Different from short videos with a few seconds vlog often …