Y Li, W Zhang, K Chen, Y Liu, P Li, R Gao… - arXiv e …, 2024 - ui.adsabs.harvard.edu
Abstract Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning
ability to understand images and videos, have received widespread attention in the …