AVA: A large-scale database for aesthetic visual analysis

X Li, K Yuan, Y Pei, Y Lu, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …

被引用次数：19 相关文章所有 3 个版本

[PDF] researchgate.net

Recommender systems leveraging multimedia content

Y Deldjoo, M Schedl, P Cremonesi, G Pasi - ACM Computing Surveys …, 2020 - dl.acm.org

Recommender systems have become a popular and effective means to manage the ever-
increasing amount of multimedia content available today and to help users discover …

被引用次数：193 相关文章所有 5 个版本

[PDF] 3dvar.com

[PDF][PDF] Hierarchical text-conditional image generation with clip latents

A Ramesh, P Dhariwal, A Nichol, C Chu… - arXiv preprint arXiv …, 2022 - 3dvar.com

Contrastive models like CLIP have been shown to learn robust representations of images
that capture both semantics and style. To leverage these representations for image …

被引用次数：4812 相关文章所有 3 个版本

[PDF] arxiv.org

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer

Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

被引用次数：459 相关文章所有 8 个版本

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

被引用次数：47 相关文章所有 3 个版本

[PDF] aaai.org

Exploring clip for assessing the look and feel of images

J Wang, KCK Chan, CC Loy - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org

Measuring the perception of visual content is a long-standing problem in computer vision.
Many mathematical models have been developed to evaluate the look or quality of an …

被引用次数：165 相关文章所有 5 个版本

[PDF] neurips.cc

Optimizing prompts for text-to-image generation

Y Hao, Z Chi, L Dong, F Wei - Advances in Neural …, 2024 - proceedings.neurips.cc

Well-designed prompts can guide text-to-image models to generate amazing images.
However, the performant prompts are often model-specific and misaligned with user input …

被引用次数：89 相关文章所有 5 个版本

[PDF] thecvf.com

Imagen editor and editbench: Advancing and evaluating text-guided image inpainting

S Wang, C Saharia, C Montgomery… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-guided image editing can have a transformative impact in supporting creative
applications. A key challenge is to generate edits that are faithful to the input text prompt …

被引用次数：93 相关文章所有 9 个版本

[PDF] thecvf.com

Musiq: Multi-scale image quality transformer

J Ke, Q Wang, Y Wang, P Milanfar… - Proceedings of the …, 2021 - openaccess.thecvf.com

Image quality assessment (IQA) is an important research topic for understanding and
improving visual experience. The current state-of-the-art IQA methods are based on …

被引用次数：354 相关文章所有 7 个版本

[PDF] neurips.cc

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：43 相关文章所有 7 个版本

高级搜索

QQ 群