Cascade r-cnn: Delving into high quality object detection

Z Cai, N Vasconcelos - … of the IEEE conference on computer …, 2018 - openaccess.thecvf.com
In object detection, an intersection over union (IoU) threshold is required to define positives
and negatives. An object detector, trained with low IoU threshold, eg 0.5, usually produces …

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

Making the v in vqa matter: Elevating the role of image understanding in visual question answering

Y Goyal, T Khot, D Summers-Stay… - Proceedings of the …, 2017 - openaccess.thecvf.com
Problems at the intersection of vision and language are of significant importance both as
challenging research questions and for the rich set of applications they enable. However …

Grad-cam: Visual explanations from deep networks via gradient-based localization

RR Selvaraju, M Cogswell, A Das… - Proceedings of the …, 2017 - openaccess.thecvf.com
We propose a technique for producing'visual explanations' for decisions from a large class
of Convolutional Neural Network (CNN)-based models, making them more transparent. Our …

Grad-CAM: visual explanations from deep networks via gradient-based localization

RR Selvaraju, M Cogswell, A Das, R Vedantam… - International journal of …, 2020 - Springer
We propose a technique for producing 'visual explanations' for decisions from a large class
of Convolutional Neural Network (CNN)-based models, making them more transparent and …

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Visual dialog

A Das, S Kottur, K Gupta, A Singh… - Proceedings of the …, 2017 - openaccess.thecvf.com
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful
dialog with humans in natural, conversational language about visual content. Specifically …

Don't just assume; look and answer: Overcoming priors for visual question answering

A Agrawal, D Batra, D Parikh… - Proceedings of the …, 2018 - openaccess.thecvf.com
A number of studies have found that today's Visual Question Answering (VQA) models are
heavily driven by superficial correlations in the training data and lack sufficient image …

Grad-CAM: Why did you say that?

RR Selvaraju, A Das, R Vedantam, M Cogswell… - arXiv preprint arXiv …, 2016 - arxiv.org
We propose a technique for making Convolutional Neural Network (CNN)-based models
more transparent by visualizing input regions that are'important'for predictions--or visual …

Tips and tricks for visual question answering: Learnings from the 2017 challenge

D Teney, P Anderson, X He… - Proceedings of the …, 2018 - openaccess.thecvf.com
This paper presents a state-of-the-art model for visual question answering (VQA), which won
the first place in the 2017 VQA Challenge. VQA is a task of significant importance for …