P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine …
Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However …
RR Selvaraju, M Cogswell, A Das… - Proceedings of the …, 2017 - openaccess.thecvf.com
We propose a technique for producing'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our …
RR Selvaraju, M Cogswell, A Das, R Vedantam… - International journal of …, 2020 - Springer
We propose a technique for producing 'visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and …
The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically …
A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image …
RR Selvaraju, A Das, R Vedantam, M Cogswell… - arXiv preprint arXiv …, 2016 - arxiv.org
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are'important'for predictions--or visual …
This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for …