H Yao,
W Wu,
T Yang, YX Song, M Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Do we fully leverage the potential of visual encoder in Multimodal Large Language Models
(MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has …