QACE: Asking questions to evaluate an image caption

From images to textual prompts: Zero-shot visual question answering with frozen large language models

J Guo, J Li, D Li, AMH Tiong, B Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …

被引用次数：139 相关文章所有 5 个版本

[PDF] arxiv.org

All you may need for vqa are image captions

S Changpinyo, D Kukliansky, I Szpektor… - arXiv preprint arXiv …, 2022 - arxiv.org

Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but
has not enjoyed the same level of engagement in terms of data creation. In this paper, we …

被引用次数：68 相关文章所有 7 个版本

[PDF] arxiv.org

From images to textual prompts: Zero-shot vqa with frozen large language models

J Guo, J Li, D Li, AMH Tiong, B Li, D Tao… - arXiv preprint arXiv …, 2022 - arxiv.org

Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …

被引用次数：47 相关文章所有 3 个版本

[PDF] wiley.com Full View

A thorough review of models, evaluation metrics, and datasets on image captioning

G Luo, L Cheng, C Jing, C Zhao… - IET Image Processing, 2022 - Wiley Online Library

Image captioning means generate descriptive sentences from a query image automatically.
It has recently received widespread attention from the computer vision and natural language …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Context matters for image descriptions for accessibility: Challenges for referenceless evaluation metrics

E Kreiss, C Bennett, S Hooshmand, E Zelikman… - arXiv preprint arXiv …, 2022 - arxiv.org

Few images on the Web receive alt-text descriptions that would make them accessible to
blind and low vision (BLV) users. Image-based NLG systems have progressed to the point …

被引用次数：34 相关文章所有 5 个版本

[PDF] arxiv.org

Maxm: Towards multilingual visual question answering

S Changpinyo, L Xue, M Yarom, AV Thapliyal… - arXiv preprint arXiv …, 2022 - arxiv.org

Visual Question Answering (VQA) has been primarily studied through the lens of the English
language. Yet, tackling VQA in other languages in the same manner would require a …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Pre-training multi-modal dense retrievers for outside-knowledge visual question answering

A Salemi, M Rafiee, H Zamani - Proceedings of the 2023 ACM SIGIR …, 2023 - dl.acm.org

This paper studies a category of visual question answering tasks, in which accessing
external knowledge is necessary for answering the questions. This category is called …

被引用次数：13 相关文章所有 4 个版本

ZVQAF: Zero-shot visual question answering with feedback from large language models

C Liu, C Wang, Y Peng, Z Li - Neurocomputing, 2024 - Elsevier

Due to the prominent zero-shot generalization in new language tasks shown by large
language models (LLMs), applying LLMs for zero-shot visual question answering (VQA) has …

被引用次数：12 相关文章

[PDF] arxiv.org

ContextRef: Evaluating Referenceless Metrics For Image Description Generation

E Kreiss, E Zelikman, C Potts, N Haber - arXiv preprint arXiv:2309.11710, 2023 - arxiv.org

Referenceless metrics (eg, CLIPScore) use pretrained vision--language models to assess
image descriptions directly without costly ground-truth reference texts. Such methods can …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Maskeval: Weighted mlm-based evaluation for text summarization and simplification

YL Liu, R Bawden, T Scialom, B Sagot… - arXiv preprint arXiv …, 2022 - arxiv.org

In text summarization and simplification, system outputs must be evaluated along multiple
dimensions such as relevance, factual consistency, fluency, and grammaticality, and a wide …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群