Rar: Retrieving and ranking augmented mllms for visual recognition Z Liu, Z Sun, Y Zang, W Li, P Zhang, X Dong, Y Xiong, D Lin, J Wang arXiv preprint arXiv:2403.13805, 2024 | 10 | 2024 |
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Z Liu, T Chu, Y Zang, X Wei, X Dong, P Zhang, Z Liang, Y Xiong, Y Qiao, ... arXiv preprint arXiv:2406.11833, 2024 | 8 | 2024 |
Mmlongbench-doc: Benchmarking long-context document understanding with visualizations Y Ma, Y Zang, L Chen, M Chen, Y Jiao, X Li, X Lu, Z Liu, Y Ma, X Dong, ... arXiv preprint arXiv:2407.01523, 2024 | 6 | 2024 |
Mia-dpo: Multi-image augmented direct preference optimization for large vision-language models Z Liu, Y Zang, X Dong, P Zhang, Y Cao, H Duan, C He, Y Xiong, D Lin, ... arXiv preprint arXiv:2410.17637, 2024 | | 2024 |
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results J Wang, Y Zang, P Zhang, T Chu, Y Cao, Z Sun, Z Liu, X Dong, T Wu, ... arXiv preprint arXiv:2406.11739, 2024 | | 2024 |