Bevt: Bert pretraining of video transformers R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, YG Jiang, L Zhou, L Yuan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 207 | 2022 |
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension B Li, R Wang, G Wang, Y Ge, Y Ge, Y Shan arXiv preprint arXiv:2307.16125, 2023 | 199 | 2023 |
Cross-domain contrastive learning for unsupervised domain adaptation R Wang, Z Wu, Z Weng, J Chen, GJ Qi, YG Jiang IEEE Transactions on Multimedia, 2022 | 142 | 2022 |
Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, L Yuan, YG Jiang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 54 | 2023 |
SEED-Bench-2: Benchmarking Multimodal Large Language Models B Li, Y Ge, Y Ge, G Wang, R Wang, R Zhang, Y Shan arXiv preprint arXiv:2311.17092, 2023 | 27 | 2023 |
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling R Wang, Z Wu, D Chen, Y Chen, X Dai, M Liu, L Zhou, L Yuan, YG Jiang arXiv preprint arXiv:2208.12257, 2022 | 3 | 2022 |
A Multimodal Framework for Video Ads Understanding Z Weng, L Meng, R Wang, Z Wu, YG Jiang Proceedings of the 29th ACM International Conference on Multimedia, 4843-4847, 2021 | 3 | 2021 |
Exploring the Consistency of Segment-level and Video-level Predictions for Improved Temporal Concept Localization in Videos Z Weng, R Wang, YG Jiang | 1 | |