Vqa: Visual question answering A Agrawal*, J Lu*, S Antol*, M Mitchell, CL Zitnick, D Parikh, D Batra International Journal of Computer Vision 123 (1), 4-31, 2017 | 5943* | 2017 |
Vqa: Visual question answering S Antol, A Agrawal, J Lu, M Mitchell, D Batra, C Lawrence Zitnick, ... Proceedings of the IEEE International Conference on Computer Vision, 2425-2433, 2015 | 5935 | 2015 |
Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks J Lu, D Batra, D Parikh, S Lee Advances in neural information processing systems, 2019 | 3522 | 2019 |
Hierarchical question-image co-attention for visual question answering J Lu, J Yang, D Batra, D Parikh Advances in neural information processing systems 29, 2016 | 1949 | 2016 |
Knowing when to look: Adaptive attention via a visual sentinel for image captioning J Lu*, C Xiong*, D Parikh, R Socher Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2017 | 1765 | 2017 |
Graph R-CNN for Scene Graph Generation J Yang*, J Lu*, S Lee, D Batra, D Parikh arXiv preprint arXiv:1808.00191, 2018 | 932 | 2018 |
Neural Baby Talk J Lu*, J Yang*, D Batra, D Parikh In Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 547 | 2018 |
12-in-1: Multi-Task Vision and Language Representation Learning J Lu*, V Goswami*, M Rohrbach, D Parikh, S Lee Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2019 | 515 | 2019 |
Parlai: A dialog research software platform AH Miller, W Feng, A Fisch, J Lu, D Batra, A Bordes, D Parikh, J Weston arXiv preprint arXiv:1705.06476, 2017 | 436 | 2017 |
Unified-IO: A unified model for vision, language, and multi-modal tasks J Lu, C Clark, R Zellers, R Mottaghi, A Kembhavi arXiv preprint arXiv:2206.08916, 2022 | 304 | 2022 |
Self-monitoring navigation agent via auxiliary progress estimation CY Ma, J Lu, Z Wu, G AlRegib, Z Kira, R Socher, C Xiong arXiv preprint arXiv:1901.03035, 2019 | 279 | 2019 |
Merlot reserve: Neural script knowledge through vision and language and sound R Zellers, J Lu, X Lu, Y Yu, Y Zhao, M Salehi, A Kusupati, J Hessel, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 213 | 2022 |
Best of both worlds: Transferring knowledge from discriminative learning to a generative visual dialog model J Lu, A Kannan, J Yang, D Parikh, D Batra Advances in Neural Information Processing Systems 30, 2017 | 142 | 2017 |
Sentinel gate for modulating auxiliary information in a long short-term memory (lstm) neural network LU Jiasen, C Xiong, R Socher US Patent 10,565,306, 2020 | 139 | 2020 |
Multi-modal answer validation for knowledge-based vqa J Wu, J Lu, A Sabharwal, R Mottaghi Proceedings of the AAAI conference on artificial intelligence 36 (3), 2712-2721, 2022 | 114 | 2022 |
Adaptive attention model for image captioning LU Jiasen, C Xiong, R Socher US Patent 10,565,305, 2020 | 113 | 2020 |
A Faster Pytorch Implementation of Faster R-CNN J Yang*, J Lu*, D Batra, D Parikh https://github.com/jwyang/faster-rcnn.pytorch, 2018 | 108 | 2018 |
X-lxmert: Paint, caption and answer questions with multi-modal transformers J Cho, J Lu, D Schwenk, H Hajishirzi, A Kembhavi arXiv preprint arXiv:2009.11278, 2020 | 103 | 2020 |
Spatially aware multimodal transformers for textvqa Y Kant, D Batra, P Anderson, A Schwing, D Parikh, J Lu, H Agrawal Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 90 | 2020 |
Deeper lstm and normalized cnn visual question answering model J Lu, X Lin, D Batra, D Parikh GitHub repository 6, 2015 | 82 | 2015 |