Vqa-e: Explaining, elaborating, and enhancing your answers for visual questions

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org

Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

被引用次数：189 相关文章所有 5 个版本

[PDF] arxiv.org

Explainability of deep vision-based autonomous driving systems: Review and challenges

É Zablocki, H Ben-Younes, P Pérez, M Cord - International Journal of …, 2022 - Springer

This survey reviews explainability methods for vision-based self-driving systems trained with
behavior cloning. The concept of explainability has several facets and the need for …

被引用次数：178 相关文章所有 7 个版本

[PDF] aaai.org

An empirical study of gpt-3 for few-shot knowledge-based vqa

Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu… - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Abstract Knowledge-based visual question answering (VQA) involves answering questions
that require external knowledge not present in the image. Existing methods first retrieve …

被引用次数：428 相关文章所有 6 个版本

[PDF] thecvf.com

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

被引用次数：1930 相关文章所有 8 个版本

[PDF] sciencedirect.com

Explaining the black-box model: A survey of local interpretation methods for deep neural networks

Y Liang, S Li, C Yan, M Li, C Jiang - Neurocomputing, 2021 - Elsevier

Recently, a significant amount of research has been investigated on interpretation of deep
neural networks (DNNs) which are normally processed as black box models. Among the …

被引用次数：230 相关文章

[PDF] aaai.org

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large
Language Models (LLMs) using an assortment of annotated downstream vision-language …

被引用次数：39 相关文章所有 3 个版本

[PDF] thecvf.com

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

S Gu, C Clark, A Kembhavi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Many high-level skills that are required for computer vision tasks, such as parsing questions,
comparing and contrasting semantics, and writing descriptions, are also required in other …

被引用次数：20 相关文章所有 3 个版本

[PDF] thecvf.com

Nlx-gpt: A model for natural language explanations in vision and vision-language tasks

F Sammani, T Mukherjee… - proceedings of the …, 2022 - openaccess.thecvf.com

Natural language explanation (NLE) models aim at explaining the decision-making process
of a black box system via generating natural language sentences which are human-friendly …

被引用次数：65 相关文章所有 8 个版本

[PDF] arxiv.org

Teach me to explain: A review of datasets for explainable natural language processing

S Wiegreffe, A Marasović - arXiv preprint arXiv:2102.12060, 2021 - arxiv.org

Explainable NLP (ExNLP) has increasingly focused on collecting human-annotated textual
explanations. These explanations are used downstream in three ways: as data …

被引用次数：125 相关文章所有 4 个版本

[PDF] thecvf.com

e-vil: A dataset and benchmark for natural language explanations in vision-language tasks

M Kayser, OM Camburu, L Salewski… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recently, there has been an increasing number of efforts to introduce models capable of
generating natural language explanations (NLEs) for their predictions on vision-language …

被引用次数：102 相关文章所有 10 个版本

高级搜索

QQ 群