A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

Explainability of deep vision-based autonomous driving systems: Review and challenges

É Zablocki, H Ben-Younes, P Pérez, M Cord - International Journal of …, 2022 - Springer
This survey reviews explainability methods for vision-based self-driving systems trained with
behavior cloning. The concept of explainability has several facets and the need for …

An empirical study of gpt-3 for few-shot knowledge-based vqa

Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Abstract Knowledge-based visual question answering (VQA) involves answering questions
that require external knowledge not present in the image. Existing methods first retrieve …

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

Explaining the black-box model: A survey of local interpretation methods for deep neural networks

Y Liang, S Li, C Yan, M Li, C Jiang - Neurocomputing, 2021 - Elsevier
Recently, a significant amount of research has been investigated on interpretation of deep
neural networks (DNNs) which are normally processed as black box models. Among the …

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large
Language Models (LLMs) using an assortment of annotated downstream vision-language …

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

S Gu, C Clark, A Kembhavi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Many high-level skills that are required for computer vision tasks, such as parsing questions,
comparing and contrasting semantics, and writing descriptions, are also required in other …

Nlx-gpt: A model for natural language explanations in vision and vision-language tasks

F Sammani, T Mukherjee… - proceedings of the …, 2022 - openaccess.thecvf.com
Natural language explanation (NLE) models aim at explaining the decision-making process
of a black box system via generating natural language sentences which are human-friendly …

Teach me to explain: A review of datasets for explainable natural language processing

S Wiegreffe, A Marasović - arXiv preprint arXiv:2102.12060, 2021 - arxiv.org
Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual
explanations. These explanations are used downstream in three ways: as data …

e-vil: A dataset and benchmark for natural language explanations in vision-language tasks

M Kayser, OM Camburu, L Salewski… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recently, there has been an increasing number of efforts to introduce models capable of
generating natural language explanations (NLEs) for their predictions on vision-language …