Deeper lstm and normalized cnn visual question answering model

Z Cai, N Vasconcelos - … of the IEEE conference on computer …, 2018 - openaccess.thecvf.com

In object detection, an intersection over union (IoU) threshold is required to define positives
and negatives. An object detector, trained with low IoU threshold, eg 0.5, usually produces …

被引用次数：5888 相关文章所有 14 个版本

[PDF] thecvf.com

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com

Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

被引用次数：5080 相关文章所有 16 个版本

[PDF] thecvf.com

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Y Goyal, T Khot, D Summers-Stay… - Proceedings of the …, 2017 - openaccess.thecvf.com

Problems at the intersection of vision and language are of significant importance both as
challenging research questions and for the rich set of applications they enable. However …

被引用次数：2808 相关文章所有 15 个版本

[PDF] thecvf.com

Grad-cam: Visual explanations from deep networks via gradient-based localization

RR Selvaraju, M Cogswell, A Das… - Proceedings of the …, 2017 - openaccess.thecvf.com

We propose a technique for producing'visual explanations' for decisions from a large class
of Convolutional Neural Network (CNN)-based models, making them more transparent. Our …

被引用次数：18589 相关文章所有 11 个版本

[PDF] arxiv.org

Grad-CAM: visual explanations from deep networks via gradient-based localization

RR Selvaraju, M Cogswell, A Das, R Vedantam… - International journal of …, 2020 - Springer

We propose a technique for producing 'visual explanations' for decisions from a large class
of Convolutional Neural Network (CNN)-based models, making them more transparent and …

被引用次数：4226 相关文章所有 9 个版本

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：7 相关文章所有 2 个版本

[PDF] thecvf.com

Visual dialog

A Das, S Kottur, K Gupta, A Singh… - Proceedings of the …, 2017 - openaccess.thecvf.com

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful
dialog with humans in natural, conversational language about visual content. Specifically …

被引用次数：1122 相关文章所有 18 个版本

[PDF] thecvf.com

Don't just assume; look and answer: Overcoming priors for visual question answering

A Agrawal, D Batra, D Parikh… - Proceedings of the …, 2018 - openaccess.thecvf.com

A number of studies have found that today's Visual Question Answering (VQA) models are
heavily driven by superficial correlations in the training data and lack sufficient image …

被引用次数：670 相关文章所有 7 个版本

[PDF] arxiv.org

Grad-CAM: Why did you say that?

RR Selvaraju, A Das, R Vedantam, M Cogswell… - arXiv preprint arXiv …, 2016 - arxiv.org

We propose a technique for making Convolutional Neural Network (CNN)-based models
more transparent by visualizing input regions that are'important'for predictions--or visual …

被引用次数：668 相关文章所有 2 个版本

[PDF] thecvf.com

Tips and tricks for visual question answering: Learnings from the 2017 challenge

D Teney, P Anderson, X He… - Proceedings of the …, 2018 - openaccess.thecvf.com

This paper presents a state-of-the-art model for visual question answering (VQA), which won
the first place in the 2017 VQA Challenge. VQA is a task of significant importance for …

被引用次数：467 相关文章所有 12 个版本

高级搜索

QQ 群