Y Yuan,
W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have recently achieved impressive general-
purpose vision-language capabilities through visual instruction tuning. However current …