Freeinit: Bridging initialization gap in video diffusion models

T Wu, C Si, Y Jiang, Z Huang, Z Liu - European Conference on Computer …, 2025 - Springer
Though diffusion-based video generation has witnessed rapid progress, the inference
results of existing models still exhibit unsatisfactory temporal consistency and unnatural …

Vlogger: Make your dream a vlog

S Zhuang, K Li, X Chen, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we present Vlogger a generic AI system for generating a minute-level video blog
(ie vlog) of user descriptions. Different from short videos with a few seconds vlog often …

Loong: Generating minute-level long videos with autoregressive language models

Y Wang, T Xiong, D Zhou, Z Lin, Y Zhao, B Kang… - arXiv preprint arXiv …, 2024 - arxiv.org
It is desirable but challenging to generate content-rich long videos in the scale of minutes.
Autoregressive large language models (LLMs) have achieved great success in generating …

Consisti2v: Enhancing visual consistency for image-to-video generation

W Ren, H Yang, G Zhang, C Wei, X Du… - arXiv preprint arXiv …, 2024 - arxiv.org
Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to
create a video sequence. A grand challenge in I2V generation is to maintain visual …

ART-V: Auto-Regressive Text-to-Video Generation with Diffusion Models

W Weng, R Feng, Y Wang, Q Dai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present ART-V an efficient framework for auto-regressive video generation with diffusion
models. Unlike existing methods that generate entire videos in one-shot ART-V generates a …

Moviedreamer: Hierarchical generation for coherent long visual sequence

C Zhao, M Liu, W Wang, W Chen, F Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in video generation have primarily leveraged diffusion models for
short-duration content. However, these approaches often fall short in modeling complex …

Freelong: Training-free long video generation with spectralblend temporal attention

Y Lu, Y Liang, L Zhu, Y Yang - arXiv preprint arXiv:2407.19918, 2024 - arxiv.org
Video diffusion models have made substantial progress in various video generation
applications. However, training models for long video generation tasks require significant …

Streamingt2v: Consistent, dynamic, and extendable long video generation from text

R Henschel, L Khachatryan, D Hayrapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video diffusion models enable the generation of high-quality videos that follow text
instructions, making it easy to create diverse and individual content. However, existing …

Evaluation of text-to-video generation models: A dynamics perspective

M Liao, H Lu, X Zhang, F Wan, T Wang, Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Comprehensive and constructive evaluation protocols play an important role in the
development of sophisticated text-to-video (T2V) generation models. Existing evaluation …

Freetraj: Tuning-free trajectory control in video diffusion models

H Qiu, Z Chen, Z Wang, Y He, M Xia, Z Liu - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion model has demonstrated remarkable capability in video generation, which further
sparks interest in introducing trajectory control into the generation process. While existing …