查看文章

arxiv.org 中的 [PDF]

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

作者

Xuweiyi Chen, Tian Xia, Sihan Xu

发表日期

2024/3/4

期刊

arXiv preprint arXiv:2403.02332

简介

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. Despite the progress, ensuring consistency across frames remains a challenge, particularly when using text prompts as control conditions. To address this problem, we introduce UniCtrl, a novel, plug-and-play method that is universally applicable to improve the spatiotemporal consistency and motion diversity of videos generated by text-to-video models without additional training. UniCtrl ensures semantic consistency across different frames through cross-frame self-attention control, and meanwhile, enhances the motion quality and spatiotemporal consistency through motion injection and spatiotemporal synchronization. Our experimental results demonstrate UniCtrl's efficacy in enhancing various text-to-video models, confirming its effectiveness and universality.

引用总数

被引用次数：1

20241

学术搜索中的文章

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

X Chen, T Xia, S Xu - arXiv preprint arXiv:2403.02332, 2024

被引用次数：1 相关文章所有 2 个版本