Quality estimation for image captions based on large-scale human evaluations

M Żelaszczyk, J Mańdziuk - Information Fusion, 2023 - Elsevier

We review the existing literature on generating text from visual data under the cross-modal
generation umbrella, which affords us to compare and contrast various approaches taking …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

UMIC: An unreferenced metric for image captioning via contrastive learning

H Lee, S Yoon, F Dernoncourt, T Bui, K Jung - arXiv preprint arXiv …, 2021 - arxiv.org

Despite the success of various text generation metrics such as BERTScore, it is still difficult
to evaluate the image captions without enough reference captions due to the diversity of the …

被引用次数：40 相关文章所有 8 个版本

[PDF] thecvf.com

Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Y Wada, K Kaneda, D Saito… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Establishing an automatic evaluation metric that closely aligns with human judgments is
essential for effectively developing image captioning models. Recent data-driven metrics …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Smurf: Semantic and linguistic understanding fusion for caption evaluation via typicality analysis

J Feinglass, Y Yang - arXiv preprint arXiv:2106.01444, 2021 - arxiv.org

The open-ended nature of visual captioning makes it a challenging area for evaluation. The
majority of proposed models rely on specialized training to improve human-correlation …

被引用次数：24 相关文章所有 9 个版本

[PDF] arxiv.org

Ic3: Image captioning by committee consensus

DM Chan, A Myers, S Vijayanarasimhan… - arXiv preprint arXiv …, 2023 - arxiv.org

If you ask a human to describe an image, they might do so in a thousand different ways.
Traditionally, image captioning models are trained to generate a single" best"(most like a …

被引用次数：8 相关文章所有 5 个版本

[PDF] thecvf.com

Towards an Exhaustive Evaluation of Vision-Language Foundation Models

E Salin, S Ayache, B Favre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Vision-language foundation models have had considerable increase in performances in the
last few years. However, there is still a lack of comprehensive evaluation methods able to …

被引用次数：1 相关文章所有 6 个版本

[PDF] ejournal.org.cn

[PDF][PDF] 基于深度学习的图像描述综述

石义乐，杨文忠，杜慧祥，王丽花，王婷，理珊珊 - 电子学报, 2021 - ejournal.org.cn

图像描述旨在通过提取图像的特征输入到语言生成模型中最后输出图像对应的描述,
来解决人工智能中自然语言处理与计算机视觉的交叉领域问题——智能图像理解. 现对2015 …

被引用次数：7 相关文章所有 3 个版本

[PDF] neurips.cc

Validated image caption rating dataset

LD Narins, A Scott, A Gautam… - Advances in …, 2024 - proceedings.neurips.cc

We present a new high-quality validated image caption rating (VICR) dataset. How well a
caption fits an image can be difficult to assess due to the subjective nature of caption quality …

# PraCegoVer: A Large Dataset for Image Captioning in Portuguese

GO dos Santos, EL Colombini, S Avila - Data, 2022 - mdpi.com

Automatically describing images using natural sentences is essential to visually impaired
people's inclusion on the Internet. This problem is known as Image Captioning. There are …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modal language generation using pivot stabilization for web-scale language coverage

AV Thapliyal, R Soricut - arXiv preprint arXiv:2005.00246, 2020 - arxiv.org

Cross-modal language generation tasks such as image captioning are directly hurt in their
ability to support non-English languages by the trend of data-hungry models combined with …

被引用次数：10 相关文章所有 4 个版本

高级搜索

QQ 群