D Yang, B Cao, G Chen, C Jiang - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
Abstract Multi-modal Large Language Models (MLLMs) demonstrate remarkable success
across various vision-language tasks. However, they suffer from visual hallucination, where …