Real-time action recognition with enhanced motion vector CNNs B Zhang, L Wang, Z Wang, Y Qiao, H Wang CVPR, 2718-2726, 2016 | 514 | 2016 |
Real-time action recognition with deeply transferred motion vector cnns B Zhang, L Wang, Z Wang, Y Qiao, H Wang IEEE Transactions on Image Processing 27 (5), 2326-2339, 2018 | 181 | 2018 |
Cross-Modal and Hierarchical Modeling of Video and Text B Zhang, H Hu, F Sha Proceedings of the European Conference on Computer Vision (ECCV), 374-390, 2018 | 154 | 2018 |
Cuhk & ethz & siat submission to activitynet challenge 2016 Y Xiong, L Wang, Z Wang, B Zhang, H Song, W Li, D Lin, Y Qiao, ... CVPR'16 ActivityNet workshop, 2016 | 137 | 2016 |
Ferret: Refer and ground anything anywhere at any granularity H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang, L Cao, SF Chang, ... arXiv preprint arXiv:2310.07704, 2023 | 101 | 2023 |
Weakly supervised patchnets: Describing and aggregating local patches for scene recognition Z Wang, L Wang, Y Wang, B Zhang, Y Qiao IEEE Transactions on Image Processing 26 (4), 2028-2041, 2017 | 97 | 2017 |
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training B McKinzie, Z Gan, JP Fauconnier, S Dodge, B Zhang, P Dufter, D Shah, ... arXiv preprint arXiv:2403.09611, 2024 | 55 | 2024 |
Cuhk & ethz & siat submission to activitynet challenge 2017 Y Zhao, B Zhang, Z Wu, S Yang, L Zhou, S Yan, L Wang, Y Xiong, D Lin, ... CVPR'17 ActivityNet workshop 8, 8, 2017 | 46 | 2017 |
Co-training Transformer with Videos and Images Improves Action Recognition B Zhang, J Yu, C Fifty, W Han, AM Dai, R Pang, F Sha arXiv preprint arXiv:2112.07175, 2021 | 44 | 2021 |
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus B Zhang, H Hu, J Lee, M Zhao, S Chammas, V Jain, E Ie, F Sha arXiv preprint arXiv:2011.09046, 2020 | 28 | 2020 |
Learning to Represent Image and Text with Denotation Graph B Zhang, H Hu, V Jain, E Ie, F Sha EMNLP'20, 823-839, 2020 | 23 | 2020 |
Topic Augmented Generator for Abstractive Summarization M Ailem, B Zhang, F Sha BayLearn, 2019 | 19 | 2019 |
MIC-TJU at MediaEval Violent Scenes Detection (VSD) 2014. B Zhang, Y Yi, H Wang, J Yu MediaEval, 2014 | 19 | 2014 |
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? L Qiu, H Hu, B Zhang, P Shaw, F Sha EMNLP'21, 2021 | 18 | 2021 |
From scarcity to efficiency: Improving clip training via visual-enriched captions Z Lai, H Zhang, B Zhang, W Wu, H Bai, A Timofeev, X Du, Z Gan, J Shan, ... arXiv preprint arXiv:2310.07699, 2023 | 16 | 2023 |
Compressing LLMs: The Truth is Rarely Pure and Never Simple A Jaiswal, Z Gan, X Du, B Zhang, Z Wang, Y Yang arXiv preprint arXiv:2310.01382, 2023 | 12 | 2023 |
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness L Cao, B Zhang, C Chen, Y Yang, X Du, W Zhang, Z Lu, Y Zheng arXiv preprint arXiv:2305.05095, 2023 | 10 | 2023 |
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens C Chen, B Zhang, L Cao, J Shen, T Gunter, AM Jose, A Toshev, J Shlens, ... arXiv preprint arXiv:2301.13081, 2023 | 8 | 2023 |
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images M Ailem, B Zhang, A Bellet, P Denis, F Sha EMNLP'18, 1478-1487, 2018 | 8 | 2018 |
Learning correlations for human action recognition in videos Y Yi, H Wang, B Zhang Multimedia Tools and Applications 76, 18891-18913, 2017 | 8 | 2017 |