Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022 | 189 | 2022 |
Internvideo-ego4d: A pack of champion solutions to ego4d challenges G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ... arXiv preprint arXiv:2211.09529, 2022 | 35 | 2022 |
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks Z Chen, J Wu, W Wang, W Su, G Chen, S Xing, M Zhong, Q Zhang, X Zhu, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 28 | 2024 |
Asymmetric masked distillation for pre-training small foundation models Z Zhao, B Huang, S Xing, G Wu, Y Qiao, L Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 2 | 2024 |
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks J Wu, M Zhong, S Xing, Z Lai, Z Liu, W Wang, Z Chen, X Zhu, L Lu, T Lu, ... arXiv preprint arXiv:2406.08394, 2024 | | 2024 |