Grad-cam: Visual explanations from deep networks via gradient-based localization RR Selvaraju, M Cogswell, A Das, R Vedantam, D Parikh, D Batra Proceedings of the IEEE international conference on computer vision, 618-626, 2017 | 21993 | 2017 |
VQA: Visual Question Answering S Antol, A Agrawal, J Lu, M Mitchell, D Batra, C Lawrence Zitnick, ... Proceedings of the IEEE International Conference on Computer Vision, 2425-2433, 2015 | 5901 | 2015 |
Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks J Lu, D Batra, D Parikh, S Lee Advances in neural information processing systems 32, 2019 | 3499 | 2019 |
Making the v in vqa matter: Elevating the role of image understanding in visual question answering Y Goyal, T Khot, D Summers-Stay, D Batra, D Parikh Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 2761 | 2017 |
Hierarchical question-image co-attention for visual question answering J Lu, J Yang, D Batra, D Parikh Advances in neural information processing systems 29, 2016 | 1945 | 2016 |
Habitat: A platform for embodied ai research M Savva, A Kadian, O Maksymets, Y Zhao, E Wijmans, B Jain, J Straub, ... Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 1285 | 2019 |
Visual dialog A Das, S Kottur, K Gupta, A Singh, D Yadav, JMF Moura, D Parikh, ... Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 1106 | 2017 |
Joint unsupervised learning of deep representations and image clusters J Yang, D Parikh, D Batra Proceedings of the IEEE conference on computer vision and pattern …, 2016 | 961 | 2016 |
Graph r-cnn for scene graph generation J Yang, J Lu, S Lee, D Batra, D Parikh Proceedings of the European conference on computer vision (ECCV), 670-685, 2018 | 932 | 2018 |
A corpus and cloze evaluation for deeper understanding of commonsense stories N Mostafazadeh, N Chambers, X He, D Parikh, D Batra, L Vanderwende, ... Proceedings of the 2016 Conference of the North American Chapter of the …, 2016 | 712 | 2016 |
Embodied question answering A Das, S Datta, G Gkioxari, S Lee, D Parikh, D Batra Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 677 | 2018 |
Don't just assume; look and answer: Overcoming priors for visual question answering A Agrawal, D Batra, D Parikh, A Kembhavi Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 663 | 2018 |
Grad-CAM: Why did you say that? RR Selvaraju, A Das, R Vedantam, M Cogswell, D Parikh, D Batra arXiv preprint arXiv:1611.07450, 2016 | 649 | 2016 |
Towards vqa models that can read A Singh, V Natarajan, M Shah, Y Jiang, X Chen, D Batra, D Parikh, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 642 | 2019 |
Ego4d: Around the world in 3,000 hours of egocentric video K Grauman, A Westbury, E Byrne, Z Chavis, A Furnari, R Girdhar, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 629 | 2022 |
The Replica dataset: A digital replica of indoor spaces J Straub, T Whelan, L Ma, Y Chen, E Wijmans, S Green, JJ Engel, ... arXiv preprint arXiv:1906.05797, 2019 | 625 | 2019 |
icoseg: Interactive co-segmentation with intelligent scribble guidance D Batra, A Kowdle, D Parikh, J Luo, T Chen 2010 IEEE computer society conference on computer vision and pattern …, 2010 | 621 | 2010 |
Neural baby talk J Lu, J Yang, D Batra, D Parikh Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 554 | 2018 |
Counterfactual visual explanations Y Goyal, Z Wu, J Ernst, D Batra, D Parikh, S Lee International Conference on Machine Learning, 2376-2384, 2019 | 544 | 2019 |
Human attention in visual question answering: Do humans and deep networks look at the same regions? A Das, H Agrawal, L Zitnick, D Parikh, D Batra Computer Vision and Image Understanding 163, 90-100, 2017 | 531 | 2017 |