Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption...

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

被引用次数：63 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Robustness of Structured Data Extraction from In-Plane Rotated Documents Using Multi-Modal Large Language Models (LLM)

A Biswas, W Talukdar - Journal of Artificial Intelligence Research, 2024 - arxiv.org

Multi-modal large language models (LLMs) have shown remarkable performance in various
natural language processing tasks, including data extraction from documents. However, the …

被引用次数：66 相关文章所有 3 个版本

[PDF] arxiv.org

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Visual hallucinations of multi-modal large language models

W Huang, H Liu, M Guo, NZ Gong - arXiv preprint arXiv:2402.14683, 2024 - arxiv.org

Visual hallucination (VH) means that a multi-modal LLM (MLLM) imagines incorrect details
about an image in visual question answering. Existing studies find VH instances only in …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Less is more: Mitigating multimodal hallucination from an eos decision perspective

Z Yue, L Zhang, Q Jin - arXiv preprint arXiv:2402.14545, 2024 - arxiv.org

Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they
may create content that is not present in the visual inputs. In this paper, we explore a new …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

What if...?: Counterfactual inception to mitigate hallucination effects in large multimodal models

J Kim, YJ Kim, YM Ro - arXiv preprint arXiv:2403.13513, 2024 - arxiv.org

This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in
addressing hallucination effects, where models generate incorrect or unrelated responses …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

LightHouse: A Survey of AGI Hallucination

F Wang - arXiv preprint arXiv:2401.06792, 2024 - arxiv.org

With the development of artificial intelligence, large-scale models have become increasingly
intelligent. However, numerous studies indicate that hallucinations within these large …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Y Liu, Z Liang, Y Wang, M He, J Li, B Zhao - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in
visual understanding and reasoning, providing sightly reasonable answers, such as image …

被引用次数：1 相关文章

高级搜索

QQ 群