A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Video-p2p: Video editing with cross-attention control

S Liu, Y Zhang, W Li, Z Lin, J Jia - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Video-P2P is the first framework for real-world video editing with cross-attention control.
While attention control has proven effective for image editing with pre-trained image …

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org
Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

Codef: Content deformation fields for temporally consistent video processing

H Ouyang, Q Wang, Y Xiao, Q Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present the content deformation field (CoDeF) as a new type of video representation
which consists of a canonical content field aggregating the static contents in the entire video …

Controlvideo: Training-free controllable text-to-video generation

Y Zhang, Y Wei, D Jiang, X Zhang, W Zuo… - arXiv preprint arXiv …, 2023 - arxiv.org
Text-driven diffusion models have unlocked unprecedented abilities in image generation,
whereas their video counterpart still lags behind due to the excessive training cost of …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator

H Huang, Y Feng, C Shi, L Xu, J Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Text-to-video is a rapidly growing research area that aims to generate a semantic, identical,
and temporal coherence sequence of frames that accurately align with the input text prompt …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

Dragondiffusion: Enabling drag-style manipulation on diffusion models

C Mou, X Wang, J Song, Y Shan, J Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …