Omnivl: One foundation model for image-language and video-language tasks J Wang, D Chen, Z Wu, C Luo, L Zhou, Y Zhao, Y Xie, C Liu, YG Jiang, ... Advances in neural information processing systems 35, 5696-5710, 2022 | 115 | 2022 |
A battle of network structures: An empirical study of cnn, transformer, and mlp Y Zhao, G Wang, C Tang, C Luo, W Zeng, ZJ Zha arXiv preprint arXiv:2108.13002, 2021 | 90 | 2021 |
Sparse MLP for image recognition: Is self-attention really necessary? C Tang, Y Zhao, G Wang, C Luo, W Xie, W Zeng Proceedings of the AAAI conference on artificial intelligence 36 (2), 2344-2351, 2022 | 87 | 2022 |
When shift operation meets vision transformer: An extremely simple alternative to attention mechanism G Wang, Y Zhao, C Tang, C Luo, W Zeng Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2423-2430, 2022 | 52 | 2022 |
Self-supervised visual representations learning by contrastive mask prediction Y Zhao, G Wang, C Luo, W Zeng, ZJ Zha Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 43 | 2021 |
Peripheral vision transformer J Min, Y Zhao, C Luo, M Cho Advances in Neural Information Processing Systems 35, 32097-32111, 2022 | 27 | 2022 |
Look before you match: Instance understanding matters in video object segmentation J Wang, D Chen, Z Wu, C Luo, C Tang, X Dai, Y Zhao, Y Xie, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023 | 26 | 2023 |
Adriver-i: A general world model for autonomous driving F Jia, W Mao, Y Liu, Y Zhao, Y Wen, C Zhang, X Zhang, T Wang arXiv preprint arXiv:2311.13549, 2023 | 14 | 2023 |
Retrievertts: Modeling decomposed factors for text-based speech insertion D Yin, C Tang, Y Liu, X Wang, Z Zhao, Y Zhao, Z Xiong, S Zhao, C Luo arXiv preprint arXiv:2206.13865, 2022 | 12 | 2022 |
Multi-scale group transformer for long sequence modeling in speech separation Y Zhao, C Luo, ZJ Zha, W Zeng Proceedings of the Twenty-Ninth International Conference on International …, 2021 | 12 | 2021 |
Streaming video model Y Zhao, C Luo, C Tang, D Chen, N Codella, ZJ Zha Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 10 | 2023 |
Panacea: Panoramic and controllable video generation for autonomous driving Y Wen, Y Zhao, Y Liu, F Jia, Y Wang, C Luo, C Zhang, T Wang, X Sun, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 8 | 2024 |
Zero-shot text-to-speech for text-based insertion in audio narration C Tang, C Luo, Z Zhao, D Yin, Y Zhao, W Zeng arXiv preprint arXiv:2109.05426, 2021 | 8 | 2021 |
General-purpose speech representation learning through a self-supervised multi-granularity framework Y Zhao, D Yin, C Luo, Z Zhao, C Tang, W Zeng, ZJ Zha arXiv preprint arXiv:2102.01930, 2021 | 7 | 2021 |
Stream query denoising for vectorized hd map construction S Wang, F Jia, Y Liu, Y Zhao, Z Chen, T Wang, C Zhang, X Zhang, F Zhao arXiv preprint arXiv:2401.09112, 2024 | 2 | 2024 |
Subjectdrive: Scaling generative data in autonomous driving via subject control B Huang, Y Wen, Y Zhao, Y Hu, Y Liu, F Jia, W Mao, T Wang, C Zhang, ... arXiv preprint arXiv:2403.19438, 2024 | 1 | 2024 |
VLM-Eval: A General Evaluation on Video Large Language Models S Li, Y Zhang, Y Zhao, Q Wang, F Jia, Y Liu, T Wang arXiv preprint arXiv:2311.11865, 2023 | | 2023 |
Attention-Guided Contrastive Masked Image Modeling for Transformer-Based Self-Supervised Learning Y Zhan, Y Zhao, C Luo, Y Zhang, X Sun 2023 IEEE International Conference on Image Processing (ICIP), 2490-2494, 2023 | | 2023 |
Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss Z Zhao, L Wu, C Tang, D Yin, Y Zhao, C Luo ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | | 2023 |
T2D: Spatiotemporal Feature Learning Based on Triple 2D Decomposition Y Zhao, C Luo, C Tang, D Chen, NC Codella, L Yuan, ZJ Zha | | |