Towards a better metric for text-to-video generation

Z Xing, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

被引用次数：78 相关文章所有 3 个版本

[PDF] arxiv.org

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig… - … on Computer Vision, 2025 - Springer

Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Is sora a world simulator? a comprehensive survey on general world models and beyond

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org

General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Videoscore: Building automatic metrics to simulate fine-grained human feedback for video generation

X He, D Jiang, G Zhang, M Ku, A Soni, S Siu… - arXiv preprint arXiv …, 2024 - arxiv.org

The recent years have witnessed great advances in video generation. However, the
development of automatic video metrics is lagging significantly behind. None of the existing …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Chronomagic-bench: A benchmark for metamorphic evaluation of text-to-time-lapse video generation

S Yuan, J Huang, Y Xu, Y Liu, S Zhang, Y Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to
evaluate the temporal and metamorphic capabilities of the T2V models (eg Sora and …

被引用次数：17 相关文章所有 4 个版本

[PDF] thecvf.com

Evaluating and Improving Compositional Text-to-Visual Generation

B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …

被引用次数：9 相关文章

[PDF] arxiv.org

Subjective-aligned dataset and metric for text-to-video quality assessment

T Kou, X Liu, Z Zhang, C Li, H Wu, X Min… - Proceedings of the …, 2024 - dl.acm.org

With the rapid development of generative models, AI-Generated Content (AIGC) has
exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

K-sort arena: Efficient and reliable benchmarking for generative models via k-wise human preferences

Z Li, X Liu, D Fu, J Li, Q Gu, K Keutzer… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancement of visual generative models necessitates efficient and reliable
evaluation methods. Arena platform, which gathers user votes on model comparisons, can …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

被引用次数：31 相关文章所有 2 个版本

[PDF] arxiv.org

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arXiv preprint arXiv …, 2024 - arxiv.org

While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群