Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Pali: A jointly-scaled multilingual language-image model

X Chen, X Wang, S Changpinyo… - arXiv preprint arXiv …, 2022 - arxiv.org
Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …

From images to textual prompts: Zero-shot visual question answering with frozen large language models

J Guo, J Li, D Li, AMH Tiong, B Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …

All you may need for vqa are image captions

S Changpinyo, D Kukliansky, I Szpektor… - arXiv preprint arXiv …, 2022 - arxiv.org
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but
has not enjoyed the same level of engagement in terms of data creation. In this paper, we …

Super-clevr: A virtual benchmark to diagnose domain robustness in visual reasoning

Z Li, X Wang, E Stengel-Eskin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Visual Question Answering (VQA) models often perform poorly on out-of-distribution
data and struggle on domain generalization. Due to the multi-modal nature of this task …

From images to textual prompts: Zero-shot vqa with frozen large language models

J Guo, J Li, D Li, AMH Tiong, B Li, D Tao… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …

Visually Grounded Language Learning: a review of language games, datasets, tasks, and models

A Suglia, I Konstas, O Lemon - Journal of Artificial Intelligence Research, 2024 - jair.org
In recent years, several machine learning models have been proposed. They are trained
with a language modelling objective on large-scale text-only data. With such pretraining …

CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models

AR Akula, K Wang, C Liu, S Saba-Sadiya, H Lu… - Iscience, 2022 - cell.com
We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new
explainable AI (XAI) framework for explaining decisions made by a deep convolutional …

Reassessing evaluation practices in visual question answering: A case study on out-of-distribution generalization

A Agrawal, I Kajić, E Bugliarello, E Davoodi… - arXiv preprint arXiv …, 2022 - arxiv.org
Vision-and-language (V&L) models pretrained on large-scale multimodal data have
demonstrated strong performance on various tasks such as image captioning and visual …

Attention cannot be an explanation

AR Akula, SC Zhu - arXiv preprint arXiv:2201.11194, 2022 - arxiv.org
Attention based explanations (viz. saliency maps), by providing interpretability to black box
models such as deep neural networks, are assumed to improve human trust and reliance in …