Seine: Short-to-long video diffusion model for generative transition and prediction

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

被引用次数：112 相关文章所有 2 个版本

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

被引用次数：69 相关文章所有 4 个版本

[PDF] thecvf.com

Videocrafter2: Overcoming data limitations for high-quality video diffusion models

H Chen, Y Zhang, X Cun, M Xia… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

被引用次数：64 相关文章所有 3 个版本

[PDF] thecvf.com

Vlogger: Make your dream a vlog

S Zhuang, K Li, X Chen, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this work we present Vlogger a generic AI system for generating a minute-level video blog
(ie vlog) of user descriptions. Different from short videos with a few seconds vlog often …

被引用次数：14 相关文章所有 3 个版本

[PDF] thecvf.com

ART-V: Auto-Regressive Text-to-Video Generation with Diffusion Models

W Weng, R Feng, Y Wang, Q Dai… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present ART-V an efficient framework for auto-regressive video generation with diffusion
models. Unlike existing methods that generate entire videos in one-shot ART-V generates a …

被引用次数：7 相关文章所有 4 个版本

[PDF] thecvf.com

InstructVideo: instructing video diffusion models with human feedback

H Yuan, S Zhang, X Wang, Y Wei… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models have emerged as the de facto paradigm for video generation. However
their reliance on web-scale data of varied quality often yields results that are visually …

被引用次数：10 相关文章所有 4 个版本

[PDF] openreview.net

Leo: Generative latent image animator for human video synthesis

Y Wang, X Ma, X Chen, C Chen, A Dantcheva… - International Journal of …, 2024 - Springer

Spatio-temporal coherency is a major challenge in synthesizing high quality videos,
particularly in synthesizing human videos that contain rich global and local deformations. To …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Streamingt2v: Consistent, dynamic, and extendable long video generation from text

R Henschel, L Khachatryan, D Hayrapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-video diffusion models enable the generation of high-quality videos that follow text
instructions, making it easy to create diverse and individual content. However, existing …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Follow-your-click: Open-domain regional image animation via short prompts

Y Ma, Y He, H Wang, A Wang, C Qi, C Cai, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite recent advances in image-to-video generation, better controllability and local
animation are less explored. Most existing image-to-video methods are not locally aware …

被引用次数：16 相关文章所有 2 个版本

[PDF] aaai.org

Brush your text: Synthesize any scene text on images via diffusion model

L Zhang, X Chen, Y Wang, Y Lu, Y Qiao - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Recently, diffusion-based image generation methods are credited for their remarkable text-to-
image generation capabilities, while still facing challenges in accurately generating …

被引用次数：9 相关文章所有 3 个版本

高级搜索

QQ 群