Seeing is believing: Mitigating hallucination in large vision-language models via clip-guided...

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

被引用次数：133 相关文章所有 2 个版本

[PDF] arxiv.org

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

被引用次数：42 相关文章所有 3 个版本

[PDF] aclanthology.org

A comprehensive survey of hallucination in large language, image, video and audio foundation models

P Sahoo, P Meharia, A Ghosh, S Saha… - Findings of the …, 2024 - aclanthology.org

The rapid advancement of foundation models (FMs) across language, image, audio, and
video domains has shown remarkable capabilities in diverse tasks. However, the …

被引用次数：3 相关文章

[PDF] arxiv.org

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

被引用次数：96 相关文章所有 3 个版本

[PDF] arxiv.org

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

被引用次数：58 相关文章所有 2 个版本

[PDF] arxiv.org

Unimel: A unified framework for multimodal entity linking with large language models

Q Liu, Y He, T Xu, D Lian, C Liu, Z Zheng… - Proceedings of the 33rd …, 2024 - dl.acm.org

Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions
within multimodal contexts to the referent entities in a multimodal knowledge base, such as …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

On Erroneous Agreements of CLIP Image Embeddings

S Li, PW Koh, SS Du - arXiv preprint arXiv:2411.05195, 2024 - arxiv.org

Recent research suggests that the failures of Vision-Language Models (VLMs) at visual
reasoning often stem from erroneous agreements--when semantically distinct images are …

被引用次数：2 相关文章所有 2 个版本

[PDF] aclanthology.org

Game on Tree: Visual Hallucination Mitigation via Coarse-to-Fine View Tree and Game Theory

X Zhuang, Z Zhu, Z Chen, Y Xie, L Liang… - Proceedings of the …, 2024 - aclanthology.org

Abstract Large Vision-Language Models (LVLMs) may produce outputs that are unfaithful to
reality, also known as visual hallucinations (VH), which hinders their application in …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Mitigating Object Hallucination via Data Augmented Contrastive Tuning

P Sarkar, S Ebrahimi, A Etemad, A Beirami… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to
hallucinate factually inaccurate information. In this work, we address object hallucinations in …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

C Cui, A Zhang, Y Zhou, Z Chen, G Deng… - arXiv preprint arXiv …, 2024 - arxiv.org

The recent advancements in large language models (LLMs) and pre-trained vision models
have accelerated the development of vision-language large models (VLLMs), enhancing the …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群