Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis C Fu, Y Dai, Y Luo, L Li, S Ren, R Zhang, Z Wang, C Zhou, Y Shen, ... arXiv preprint arXiv:2405.21075, 2024 | 57 | 2024 |
A challenger to gpt-4v? early explorations of gemini in visual expertise C Fu, R Zhang, H Lin, Z Wang, T Gao, Y Luo, Y Huang, Z Zhang, L Qiu, ... arXiv preprint arXiv:2312.12436, 2023 | 43 | 2023 |
A unified framework for 3d point cloud visual grounding H Lin, Y Luo, X Zheng, L Li, F Chao, T Jin, D Luo, Y Wang, L Cao, R Ji arXiv preprint arXiv:2308.11887, 2023 | 7 | 2023 |
Attention‐aware spatio‐temporal learning for multi‐view gait‐based age estimation and gender classification B Huang, Y Luo, J Xie, J Pan, C Zhou IET Computer Vision, 2022 | 4 | 2022 |
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Y Luo, X Zheng, X Yang, G Li, H Lin, J Huang, J Ji, F Chao, J Luo, R Ji arXiv preprint arXiv:2411.13093, 2024 | | 2024 |
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization Y Luo, H Lin, X Zheng, Y Jiang, F Chao, J Hu, G Jiang, S Zhang, R Ji arXiv preprint arXiv:2404.11064, 2024 | | 2024 |