Learning to reason: End-to-end module networks for visual question answering R Hu, J Andreas, M Rohrbach, T Darrell, K Saenko Proceedings of the IEEE international conference on computer vision, 804-813, 2017 | 675 | 2017 |
Natural language object retrieval R Hu, H Xu, M Rohrbach, J Feng, K Saenko, T Darrell Proceedings of the IEEE conference on computer vision and pattern …, 2016 | 628 | 2016 |
Flava: A foundational language and vision alignment model A Singh, R Hu, V Goswami, G Couairon, W Galuba, M Rohrbach, D Kiela Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 538 | 2022 |
Grounding of textual phrases in images by reconstruction A Rohrbach, M Rohrbach, R Hu, T Darrell, B Schiele Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The …, 2016 | 537 | 2016 |
Speaker-follower models for vision-and-language navigation D Fried, R Hu, V Cirik, A Rohrbach, J Andreas, LP Morency, ... Advances in neural information processing systems 31, 2018 | 492 | 2018 |
Segmentation from natural language expressions R Hu, M Rohrbach, T Darrell Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The …, 2016 | 415 | 2016 |
Modeling relationships in referential expressions with compositional modular networks R Hu, M Rohrbach, J Andreas, T Darrell, K Saenko Proceedings of the IEEE conference on computer vision and pattern …, 2017 | 411 | 2017 |
LSDA: Large scale detection through adaptation J Hoffman, S Guadarrama, ES Tzeng, R Hu, J Donahue, R Girshick, ... Advances in neural information processing systems 27, 2014 | 379 | 2014 |
Learning to segment every thing R Hu, P Dollár, K He, T Darrell, R Girshick Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 348 | 2018 |
Convnext v2: Co-designing and scaling convnets with masked autoencoders S Woo, S Debnath, R Hu, X Chen, Z Liu, IS Kweon, S Xie Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 344 | 2023 |
UniT: Multimodal Multitask Learning with a Unified Transformer R Hu, A Singh arXiv preprint arXiv:2102.10772, 2021 | 344 | 2021 |
Textcaps: a dataset for image captioning with reading comprehension O Sidorov, R Hu, M Rohrbach, A Singh Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23 …, 2020 | 286 | 2020 |
Grounding visual explanations L Anne Hendricks, R Hu, T Darrell, Z Akata Proceedings of the European Conference on Computer Vision (ECCV), 264-279, 2018 | 232 | 2018 |
Explainable neural computation via stack neural module networks R Hu, J Andreas, T Darrell, K Saenko Proceedings of the European conference on computer vision (ECCV), 53-69, 2018 | 218 | 2018 |
Iterative answer prediction with pointer-augmented multimodal transformers for textvqa R Hu, A Singh, T Darrell, M Rohrbach Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 215 | 2020 |
Scaling language-image pre-training via masking Y Li, H Fan, R Hu, C Feichtenhofer, K He Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 212 | 2023 |
Language-conditioned graph networks for relational reasoning R Hu, A Rohrbach, T Darrell, K Saenko Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 181 | 2019 |
In So Kweon, and Saining Xie. Convnext v2: Co-designing and scaling convnets with masked autoencoders S Woo, S Debnath, R Hu, X Chen, Z Liu arXiv preprint arXiv:2301.00808 5 (8), 2023 | 156 | 2023 |
Generating counterfactual explanations with natural language LA Hendricks, R Hu, T Darrell, Z Akata arXiv preprint arXiv:1806.09809, 2018 | 104 | 2018 |
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation R Hu, D Fried, A Rohrbach, D Klein, T Darrell, K Saenko arXiv preprint arXiv:1906.00347, 2019 | 89 | 2019 |