C Tao, S Su,
X Zhu, C Zhang, Z Chen, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advance of Large Language Models (LLMs) has catalyzed the development of
Vision-Language Models (VLMs). Monolithic VLMs, which avoid modality-specific encoders …