Zero-shot video editing using off-the-shelf image diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：38 相关文章所有 3 个版本

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc

The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

被引用次数：169 相关文章所有 6 个版本

[PDF] thecvf.com

Video-p2p: Video editing with cross-attention control

S Liu, Y Zhang, W Li, Z Lin, J Jia - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Video-P2P is the first framework for real-world video editing with cross-attention control.
While attention control has proven effective for image editing with pre-trained image …

被引用次数：117 相关文章所有 4 个版本

[PDF] arxiv.org

Rerender a video: Zero-shot text-guided video-to-video translation

S Yang, Y Zhou, Z Liu, CC Loy - SIGGRAPH Asia 2023 Conference …, 2023 - dl.acm.org

Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …

被引用次数：114 相关文章所有 4 个版本

[PDF] thecvf.com

Codef: Content deformation fields for temporally consistent video processing

H Ouyang, Q Wang, Y Xiao, Q Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present the content deformation field (CoDeF) as a new type of video representation
which consists of a canonical content field aggregating the static contents in the entire video …

被引用次数：43 相关文章所有 3 个版本

[PDF] arxiv.org

Controlvideo: Training-free controllable text-to-video generation

Y Zhang, Y Wei, D Jiang, X Zhang, W Zuo… - arXiv preprint arXiv …, 2023 - arxiv.org

Text-driven diffusion models have unlocked unprecedented abilities in image generation,
whereas their video counterpart still lags behind due to the excessive training cost of …

被引用次数：115 相关文章所有 3 个版本

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

被引用次数：56 相关文章所有 4 个版本

[PDF] neurips.cc

Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator

H Huang, Y Feng, C Shi, L Xu, J Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Text-to-video is a rapidly growing research area that aims to generate a semantic, identical,
and temporal coherence sequence of frames that accurately align with the input text prompt …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …

被引用次数：63 相关文章所有 5 个版本

[PDF] arxiv.org

Dragondiffusion: Enabling drag-style manipulation on diffusion models

C Mou, X Wang, J Song, Y Shan, J Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …

被引用次数：73 相关文章所有 3 个版本

高级搜索

QQ 群