Z Li, B Yang, Q Liu, Z Ma, S Zhang, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Multimodal Models have demonstrated impressive capabilities in understanding
general vision-language tasks. However, due to the limitation of supported input resolution …