Compositional temporal grounding with structured variational cross-graph correspondence learning J Li, J Xie, L Qian, L Zhu, S Tang, F Wu, Y Yang, Y Zhuang, XE Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 60 | 2022 |
Fine-grained semantically aligned vision-language pre-training J Li, X He, L Wei, L Qian, L Zhu, L Xie, Y Zhuang, Q Tian, S Tang Advances in neural information processing systems 35, 7290-7303, 2022 | 51 | 2022 |
Dilated context integrated network with cross-modal consensus for temporal emotion localization in videos J Li, J Xie, L Zhu, L Qian, S Tang, W Zhang, H Shi, S Zhang, L Wei, Q Tian, ... Proceedings of the 30th ACM International Conference on Multimedia, 5083-5092, 2022 | 9 | 2022 |
Momentor: Advancing video large language model with fine-grained temporal reasoning L Qian, J Li, Y Wu, Y Ye, H Fei, TS Chua, Y Zhuang, S Tang arXiv preprint arXiv:2402.11435, 2024 | 6 | 2024 |