Egocentric video-language pretraining KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Neural Information Processing Systems (NeurIPS) 2 (3), 2022 | 129 | 2022 |
Multi-modal graph neural network for joint reasoning on vision and scene text D Gao, K Li, R Wang, S Shan, X Chen IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12746 …, 2020 | 128 | 2020 |
Show-1: Marrying pixel and latent diffusion models for text-to-video generation DJ Zhang, JZ Wu, JW Liu, R Zhao, L Ran, Y Gu, D Gao, MZ Shou arXiv preprint arXiv:2309.15818, 2023 | 76 | 2023 |
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14773 …, 2023 | 57 | 2023 |
UniVTG: Towards Unified Video-Language Temporal Grounding KQ Lin, P Zhang, J Chen, S Pramanick, D Gao, AJ Wang, R Yan, MZ Shou IEEE/CVF International Conference on Computer Vision (ICCV), 2023 | 44 | 2023 |
Assistgpt: A general multi-modal assistant that can plan, execute, inspect, and learn D Gao, L Ji, L Zhou, KQ Lin, J Chen, Z Fan, MZ Shou arXiv preprint arXiv:2306.08640, 2023 | 44 | 2023 |
CRIC: A vqa dataset for compositional reasoning on vision and commonsense D Gao, R Wang, S Shan, X Chen IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 | 26* | 2022 |
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments D Gao, R Wang, Z Bai, X Chen IEEE/CVF International Conference on Computer Vision (ICCV), 1675-1685, 2021 | 25 | 2021 |
Weijie Kong, et al KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao Egocentric video-language pretraining. NeurIPS 35 (7575-7586), 26, 2022 | 22 | 2022 |
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval Y Wang, D Gao, L Yu, W Lei, M Feiszli, MZ Shou European Conference on Computer Vision (ECCV), 2022 | 21 | 2022 |
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant B Wong, J Chen, Y Wu, SW Lei, D Mao, D Gao, MZ Shou European Conference on Computer Vision (ECCV), 2022 | 21 | 2022 |
Symbolic replay: Scene graph as prompt for continual learning on vqa task SW Lei, D Gao, JZ Wu, Y Wang, W Liu, M Zhang, MZ Shou The AAAI Conference on Artificial Intelligence (AAAI), 2023 | 19 | 2023 |
Cone: An efficient coarse-to-fine alignment framework for long video temporal grounding Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan Annual Meeting of the Association for Computational Linguistics (ACL), 2022 | 19 | 2022 |
Affordance grounding from demonstration video to target image J Chen, D Gao, KQ Lin, MZ Shou IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6799-6808, 2023 | 17 | 2023 |
Cvpr 2023 text guided video editing competition JZ Wu, X Li, D Gao, Z Dong, J Bai, A Singh, X Xiang, Y Li, Z Huang, Y Sun, ... arXiv preprint arXiv:2310.16003, 2023 | 16 | 2023 |
Learning to recognize visual concepts for visual question answering with structural label space D Gao, R Wang, S Shan, X Chen IEEE Journal of Selected Topics in Signal Processing (JSTSP) 14 (3), 494-505, 2020 | 12 | 2020 |
GroundNLQ@ Ego4D Natural Language Queries Challenge 2023 Z Hou, L Ji, D Gao, W Zhong, K Yan, C Li, WK Chan, CW Ngo, N Duan, ... arXiv preprint arXiv:2306.15255, 2023 | 9 | 2023 |
Assistsr: Task-oriented video segment retrieval for personal AI assistant SW Lei, D Gao, Y Wang, D Mao, Z Liang, L Ran, MZ Shou Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021 | 8* | 2021 |
AssistGUI: Task-Oriented PC Graphical User Interface Automation D Gao, L Ji, Z Bai, M Ouyang, P Li, D Mao, Q Wu, W Zhang, P Wang, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 7* | 2024 |
An efficient coarse-to-fine alignment framework@ ego4d natural language queries challenge 2022 Z Hou, W Zhong, L Ji, D Gao, K Yan, WK Chan, CW Ngo, Z Shou, N Duan arXiv preprint arXiv:2211.08776, 2022 | 7 | 2022 |