Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding A Fukui, DH Park, D Yang, A Rohrbach, T Darrell, M Rohrbach Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016 | 1771 | 2016 |
A Dataset for Movie Description A Rohrbach, M Rohrbach, N Tandon, B Schiele Computer Vision and Pattern Recognition (CVPR), 2015 | 601 | 2015 |
Women also Snowboard: Overcoming Bias in Captioning Models LA Hendricks, K Burns, K Saenko, T Darrell, A Rohrbach Proceedings of the European Conference on Computer Vision (ECCV), 2018 | 536* | 2018 |
Grounding of textual phrases in images by reconstruction A Rohrbach, M Rohrbach, R Hu, T Darrell, B Schiele European Conference on Computer Vision (ECCV), 2016 | 535 | 2016 |
Speaker-Follower Models for Vision-and-Language Navigation D Fried, R Hu, V Cirik, A Rohrbach, J Andreas, LP Morency, ... Conference on Neural Information Processing Systems (NeurIPS), 2018 | 484 | 2018 |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence DH Park, LA Hendricks, Z Akata, A Rohrbach, B Schiele, T Darrell, ... Computer Vision and Pattern Recognition (CVPR), 2018 | 468 | 2018 |
Object Hallucination in Image Captioning A Rohrbach, LA Hendricks, K Burns, T Darrell, K Saenko Conference on Empirical Methods on Natural Language Processing (EMNLP), 2018 | 380 | 2018 |
Movie Description A Rohrbach, A Torabi, M Rohrbach, N Tandon, C Pal, H Larochelle, ... International Journal of Computer Vision (IJCV), 2017 | 380 | 2017 |
How Much Can CLIP Benefit Vision-and-Language Tasks? S Shen, LH Li, H Tan, M Bansal, A Rohrbach, KW Chang, Z Yao, ... International Conference on Learning Representations (ICLR), 2022 | 366 | 2022 |
Textual Explanations for Self-Driving Vehicles J Kim, A Rohrbach, T Darrell, J Canny, Z Akata European Conference on Computer Vision (ECCV), 2018 | 320 | 2018 |
Coherent Multi-Sentence Video Description with Variable Level of Detail A Rohrbach, M Rohrbach, W Qiu, A Friedrich, M Pinkal, B Schiele German Conference on Pattern Recognition (GCPR), 2014 | 268 | 2014 |
Recognizing fine-grained and composite activities using hand-centric features and script data M Rohrbach, A Rohrbach, M Regneri, S Amin, M Andriluka, M Pinkal, ... International Journal of Computer Vision (IJCV), 2015 | 221 | 2015 |
Language-Conditioned Graph Networks for Relational Reasoning R Hu, A Rohrbach, T Darrell, K Saenko International Conference on Computer Vision (ICCV), 2019 | 179 | 2019 |
Video Object Segmentation with Language Referring Expressions A Khoreva, A Rohrbach, B Schiele Asian Conference on Computer Vision (ACCV), 2018 | 173 | 2018 |
The long-short story of movie description A Rohrbach, M Rohrbach, B Schiele German Conference on Pattern Recognition (GCPR), 2015 | 158 | 2015 |
Robust Change Captioning DH Park, T Darrell, A Rohrbach International Conference on Computer Vision (ICCV), 2019 | 132 | 2019 |
Detreg: Unsupervised pretraining with region priors for object detection A Bar, X Wang, V Kantorov, CJ Reed, R Herzig, G Chechik, A Rohrbach, ... Conference on Computer Vision and Pattern Recognition (CVPR), 2022 | 126 | 2022 |
Fooling Vision and Language Models Despite Localization and Attention Mechanism X Xu, X Chen, C Liu, A Rohrbach, T Darell, D Song Computer Vision and Pattern Recognition (CVPR), 2018 | 115* | 2018 |
More control for free! image synthesis with semantic diffusion guidance X Liu, DH Park, S Azadi, G Zhang, A Chopikyan, Y Hu, H Shi, A Rohrbach, ... IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023 | 112 | 2023 |
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering T Maharaj, N Ballas, A Rohrbach, A Courville, C Pal Computer Vision and Pattern Recognition (CVPR), 2017 | 112 | 2017 |