Exploring video quality assessment on user generated contents from aesthetic and technical...

X Li, K Yuan, Y Pei, Y Lu, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …

被引用次数：22 相关文章所有 3 个版本

[PDF] thecvf.com

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com

Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

被引用次数：69 相关文章所有 4 个版本

[PDF] thecvf.com

Videocrafter2: Overcoming data limitations for high-quality video diffusion models

H Chen, Y Zhang, X Cun, M Xia… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …

被引用次数：64 相关文章所有 3 个版本

[PDF] thecvf.com

AIS 2024 challenge on video quality assessment of user-generated content: Methods and results

MV Conde, S Zadtootaghaj, N Barman… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge focused on
User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based …

被引用次数：4 相关文章所有 3 个版本

[PDF] thecvf.com

Evalcrafter: Benchmarking and evaluating large video generation models

Y Liu, X Cun, X Liu, X Wang, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The vision and language generative models have been overgrown in recent years. For
video generation various open-sourced models and public-available services have been …

被引用次数：47 相关文章所有 3 个版本

[PDF] thecvf.com

Q-instruct: Improving low-level visual abilities for multi-modality foundation models

H Wu, Z Zhang, E Zhang, C Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multi-modality large language models (MLLMs) as represented by GPT-4V have introduced
a paradigm shift for visual perception and understanding tasks that a variety of abilities can …

被引用次数：32 相关文章所有 4 个版本

[PDF] arxiv.org

Q-bench: A benchmark for general-purpose foundation models on low-level vision

H Wu, Z Zhang, E Zhang, C Chen, L Liao… - arXiv preprint arXiv …, 2023 - arxiv.org

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift
in computer vision from specialized models to general-purpose foundation models …

被引用次数：66 相关文章所有 5 个版本

[PDF] thecvf.com

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

S Zhou, P Yang, J Wang, Y Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Text-based diffusion models have exhibited remarkable success in generation and editing
showing great promise for enhancing visual content with their generative prior. However …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Q-align: Teaching lmms for visual scoring via discrete text-defined levels

H Wu, Z Zhang, W Zhang, C Chen, L Liao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

The explosion of visual content available online underscores the requirement for an
accurate machine assessor to robustly evaluate scores across diverse types of visual …

被引用次数：42 相关文章所有 4 个版本

[PDF] thecvf.com

Facetalk: Audio-driven motion diffusion for neural parametric head models

S Aneja, J Thies, A Dai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

We introduce FaceTalk a novel generative approach designed for synthesizing high-fidelity
3D motion sequences of talking human heads from input audio signal. To capture the …

被引用次数：15 相关文章所有 3 个版本

高级搜索

QQ 群