Masked-attention mask transformer for universal image segmentation B Cheng, I Misra, AG Schwing, A Kirillov, R Girdhar Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 1521 | 2022 |
Learning a Predictable and Generative Vector Representation for Objects R Girdhar, DF Fouhey, M Rodriguez, A Gupta European Conference on Computer Vision (ECCV) 2016, 2016 | 827 | 2016 |
Video Action Transformer Network R Girdhar, J Carreira, C Doersch, A Zisserman Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 2019 | 825 | 2019 |
Ego4d: Around the world in 3,000 hours of egocentric video K Grauman, A Westbury, E Byrne, Z Chavis, A Furnari, R Girdhar, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 669 | 2022 |
ActionVLAD: Learning spatio-temporal aggregation for action classification R Girdhar, D Ramanan, A Gupta, J Sivic, B Russell Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2017 | 569 | 2017 |
Imagebind: One embedding space to bind them all R Girdhar, A El-Nouby, Z Liu, M Singh, KV Alwala, A Joulin, I Misra Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 495 | 2023 |
Detecting twenty-thousand classes using image-level supervision X Zhou, R Girdhar, A Joulin, P Krähenbühl, I Misra European Conference on Computer Vision, 350-368, 2022 | 460 | 2022 |
An end-to-end transformer model for 3d object detection I Misra, R Girdhar, A Joulin Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 428 | 2021 |
Attentional pooling for action recognition R Girdhar, D Ramanan Advances in Neural Information Processing Systems (NeurIPS), 2017, 2017 | 415 | 2017 |
Detect-and-Track: Efficient Pose Estimation in Videos R Girdhar, G Gkioxari, L Torresani, M Paluri, D Tran Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 2018 | 295 | 2018 |
Self-supervised pretraining of 3d features on any point-cloud Z Zhang, R Girdhar, A Joulin, I Misra Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 246 | 2021 |
Anticipative Video Transformer R Girdhar, K Grauman IEEE/CVF International Conference on Computer Vision (ICCV), 2021 | 202 | 2021 |
Omnivore: A single model for many visual modalities R Girdhar, M Singh, N Ravi, L Van Der Maaten, A Joulin, I Misra Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 186 | 2022 |
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning R Girdhar, D Ramanan International Conference on Learning Representations (ICLR), 2020, 2020 | 169 | 2020 |
Cut and learn for unsupervised object detection and instance segmentation X Wang, R Girdhar, SX Yu, I Misra Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 118 | 2023 |
Learning video representations from large language models Y Zhao, I Misra, P Krähenbühl, R Girdhar Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 105 | 2023 |
Binge Watching: Scaling Affordance Learning from Sitcoms X Wang, R Girdhar, A Gupta Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2017 | 87 | 2017 |
Omnimae: Single model masked pretraining on images and videos R Girdhar, A El-Nouby, M Singh, KV Alwala, A Joulin, I Misra Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 81* | 2023 |
Emu video: Factorizing text-to-video generation by explicit image conditioning R Girdhar, M Singh, A Brown, Q Duval, S Azadi, SS Rambhatla, A Shah, ... arXiv preprint arXiv:2311.10709, 2023 | 76* | 2023 |
DistInit: Learning Video Representations without a Single Labeled Video R Girdhar, D Tran, L Torresani, D Ramanan International Conference on Computer Vision (ICCV) 2019, 2019 | 73 | 2019 |