A survey on video diffusion models

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig… - … on Computer Vision, 2025 - Springer
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …

Evaluating and Improving Compositional Text-to-Visual Generation

B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …

Subjective-aligned dataset and metric for text-to-video quality assessment

T Kou, X Liu, Z Zhang, C Li, H Wu, X Min… - Proceedings of the …, 2024 - dl.acm.org
With the rapid development of generative models, AI-Generated Content (AIGC) has
exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has …

K-sort arena: Efficient and reliable benchmarking for generative models via k-wise human preferences

Z Li, X Liu, D Fu, J Li, Q Gu, K Keutzer… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of visual generative models necessitates efficient and reliable
evaluation methods. Arena platform, which gathers user votes on model comparisons, can …

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arXiv preprint arXiv …, 2024 - arxiv.org
While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …