Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022 | 188 | 2022 |
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... ICLR2023, 2023 | 74 | 2023 |
Dcan: improving temporal action detection via dual context aggregation G Chen, YD Zheng, L Wang, T Lu Proceedings of the AAAI conference on artificial intelligence 36 (1), 248-257, 2022 | 53 | 2022 |
Videollm: Modeling video sequence with large language models G Chen, YD Zheng, J Wang, J Xu, Y Huang, J Pan, Y Wang, Y Wang, ... arXiv preprint arXiv:2305.13292, 2023 | 52 | 2023 |
Internvideo-ego4d: A pack of champion solutions to ego4d challenges G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ... arXiv preprint arXiv:2211.09529, 2022 | 35 | 2022 |
Mvbench: A comprehensive multi-modal video understanding benchmark K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 32 | 2024 |
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks Z Chen, J Wu, W Wang, W Su, G Chen, S Xing, M Zhong, Q Zhang, X Zhu, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 28 | 2024 |
Basictad: an astounding rgb-only baseline for temporal action detection M Yang, G Chen, YD Zheng, T Lu, L Wang Computer Vision and Image Understanding 232, 103692, 2023 | 26 | 2023 |
Avsegformer: Audio-visual segmentation with transformer S Gao, Z Chen, G Chen, W Wang, T Lu Proceedings of the AAAI Conference on Artificial Intelligence 38 (11), 12155 …, 2024 | 16 | 2024 |
Video mamba suite: State space model as a versatile alternative for video understanding G Chen, Y Huang, J Xu, B Pei, Z Chen, Z Li, J Wang, K Li, T Lu, L Wang arXiv preprint arXiv:2403.09626, 2024 | 16 | 2024 |
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation Z Chen, J Wang, W Wang, G Chen, E Xie, P Luo, T Lu arXiv preprint arXiv:2111.02394, 2021 | 14 | 2021 |
Memory-and-Anticipation Transformer for Online Action Understanding J Wang, G Chen, Y Huang, L Wang, T Lu Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 13 | 2023 |
Internvideo2: Scaling video foundation models for multimodal video understanding Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, J Xu, Z Wang, ... arXiv preprint arXiv:2403.15377, 2024 | 11 | 2024 |
MRSN: Multi-Relation Support Network for Video Action Detection YD Zheng, G Chen, M Yuan, T Lu 2023 IEEE International Conference on Multimedia and Expo (ICME), 1026-1031, 2023 | 5 | 2023 |
Retrieval-augmented egocentric video captioning J Xu, Y Huang, J Hou, G Chen, Y Zhang, R Feng, W Xie arXiv preprint arXiv:2401.00789, 2024 | 2 | 2024 |
Champion Solution for the WSDM2023 Toloka VQA Challenge S Gao, Z Chen, G Chen, W Wang, T Lu arXiv preprint arXiv:2301.09045, 2023 | 2 | 2023 |
Matching Compound Prototypes for Few-Shot Action Recognition Y Huang, L Yang, G Chen, H Zhang, F Lu, Y Sato International Journal of Computer Vision, 1-26, 2024 | | 2024 |
EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World Y Huang, G Chen, J Xu, M Zhang, L Yang, B Pei, H Zhang, L Dong, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | | 2024 |