J Chen,
Y Liu, D Li,
X An, Z Feng,
Y Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent
capabilities in instruction following and reasoning, has greatly advanced the field of visual …