Vizwiz grand challenge: Answering visual questions from blind people D Gurari, Q Li, AJ Stangl, A Guo, C Lin, K Grauman, J Luo, JP Bigham Proceedings of the IEEE conference on computer vision and pattern …, 2018 | 631 | 2018 |
Action recognition by learning deep multi-granular spatio-temporal video representation Q Li, Z Qiu, T Yao, T Mei, Y Rui, J Luo Proceedings of the 2016 ACM on international conference on multimedia …, 2016 | 152 | 2016 |
Vqa-e: Explaining, elaborating, and enhancing your answers for visual questions Q Li, Q Tao, S Joty, J Cai, J Luo Proceedings of the European Conference on Computer Vision (ECCV), 552-567, 2018 | 110 | 2018 |
Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people D Gurari, Q Li, C Lin, Y Zhao, A Guo, A Stangl, JP Bigham Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 98 | 2019 |
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning Q Li, S Huang, Y Hong, Y Chen, YN Wu, SC Zhu ICML, 2020 | 83 | 2020 |
Tell-and-answer: Towards explainable visual question answering using attributes and captions Q Li, J Fu, D Yu, T Mei, J Luo EMNLP, 2018 | 67 | 2018 |
Sqa3d: Situated question answering in 3d scenes X Ma, S Yong, Z Zheng, Q Li, Y Liang, SC Zhu, S Huang arXiv preprint arXiv:2210.07474, 2022 | 66 | 2022 |
Learning by fixing: Solving math word problems with weak supervision Y Hong, Q Li, D Ciao, S Huang, SC Zhu Proceedings of the AAAI conference on artificial intelligence 35 (6), 4959-4967, 2021 | 59 | 2021 |
Why does a visual question have different answers? N Bhattacharya, Q Li, D Gurari Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019 | 54 | 2019 |
3d-vista: Pre-trained transformer for 3d vision and text alignment Z Zhu, X Ma, Y Chen, Z Deng, S Huang, Q Li Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 45 | 2023 |
Vireo@ trecvid 2017: Video-to-text, ad-hoc video search and video hyperlinking PA Nguyen, Q Li, ZQ Cheng, YJ Lu, H Zhang, X Wu, CW Ngo IEEE, 2017 | 37 | 2017 |
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering Q Li, S Huang, Y Hong, SC Zhu ECCV, 2020 | 36 | 2020 |
Smart: A situation model for algebra story problems via attributed grammar Y Hong, Q Li, R Gong, D Ciao, S Huang, SC Zhu Proceedings of the AAAI conference on artificial intelligence 35 (14), 13009 …, 2021 | 35 | 2021 |
Yourefit: Embodied reference understanding with language and gesture Y Chen, Q Li, D Kong, YL Kei, SC Zhu, T Gao, Y Zhu, S Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 33 | 2021 |
Vlgrammar: Grounded grammar induction of vision and language Y Hong, Q Li, SC Zhu, S Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 31 | 2021 |
An embodied generalist agent in 3d world J Huang, S Yong, X Ma, X Linghu, P Li, Y Wang, Q Li, SC Zhu, B Jia, ... arXiv preprint arXiv:2311.12871, 2023 | 25 | 2023 |
Learning hierarchical video representation for action recognition Q Li, Z Qiu, T Yao, T Mei, Y Rui, J Luo International Journal of Multimedia Information Retrieval 6, 85-98, 2017 | 25 | 2017 |
Parameter-efficient fine-tuning for pre-trained vision models: A survey Y Xin, S Luo, H Zhou, J Du, X Liu, Y Fan, Q Li, Y Du arXiv preprint arXiv:2402.02242, 2024 | 24 | 2024 |
Msr asia msm at thumos challenge 2015 Z Qiu, Q Li, T Yao, T Mei, Y Rui CVPR workshop 8, 2015 | 23 | 2015 |
Towards a unified foundation model: Jointly pre-training transformers on unpaired images and text Q Li, B Gong, Y Cui, D Kondratyuk, X Du, MH Yang, M Brown arXiv preprint arXiv:2112.07074, 2021 | 22 | 2021 |