Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues …
A Biswas, W Talukdar - Journal of Artificial Intelligence Research, 2024 - arxiv.org
Multi-modal large language models (LLMs) have shown remarkable performance in various natural language processing tasks, including data extraction from documents. However, the …
This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models …
T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor …
W Huang, H Liu, M Guo, NZ Gong - arXiv preprint arXiv:2402.14683, 2024 - arxiv.org
Visual hallucination (VH) means that a multi-modal LLM (MLLM) imagines incorrect details about an image in visual question answering. Existing studies find VH instances only in …
Z Yue, L Zhang, Q Jin - arXiv preprint arXiv:2402.14545, 2024 - arxiv.org
Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new …
J Kim, YJ Kim, YM Ro - arXiv preprint arXiv:2403.13513, 2024 - arxiv.org
This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in addressing hallucination effects, where models generate incorrect or unrelated responses …
F Wang - arXiv preprint arXiv:2401.06792, 2024 - arxiv.org
With the development of artificial intelligence, large-scale models have become increasingly intelligent. However, numerous studies indicate that hallucinations within these large …
Y Liu, Z Liang, Y Wang, M He, J Li, B Zhao - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in visual understanding and reasoning, providing sightly reasonable answers, such as image …