Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

Debiasing large visual language models

YF Zhang, W Yu, Q Wen, X Wang, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
In the realms of computer vision and natural language processing, Large Vision-Language
Models (LVLMs) have become indispensable tools, proficient in generating textual …

Overcoming language priors via shuffling language bias for robust visual question answering

J Zhao, Z Yu, X Zhang, Y Yang - IEEE Access, 2023 - ieeexplore.ieee.org
Recent research has revealed the notorious language prior problem in visual question
answering (VQA) tasks based on visual-textual interaction, which indicates that well …

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

N Xu, F Wang, S Zhang, H Poon, M Chen - arXiv preprint arXiv …, 2024 - arxiv.org
Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs),
multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities …

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

W An, F Tian, S Leng, J Nie, H Lin, QY Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite their great success across various multimodal tasks, Large Vision-Language
Models (LVLMs) are facing a prevalent problem with object hallucinations, where the …

StableNet: Distinguishing the hard samples to overcome language priors in visual question answering

Z Yu, J Zhao, C Guo, Y Yang - IET Computer Vision, 2024 - Wiley Online Library
With the booming fields of computer vision and natural language processing, cross‐modal
intersections such as visual question answering (VQA) have become very popular …

Combating Visual Question Answering Hallucinations via Robust Multi-Space Co-Debias Learning

J Zhu, Y Liu, H Zhu, H Lin, Y Jiang, Z Zhang… - ACM Multimedia … - openreview.net
The challenge of bias in visual question answering (VQA) has gained considerable attention
in contemporary research. Various intricate bias dependencies, such as modalities and data …

[PDF][PDF] Exploring deep learning for multimodal understanding

M Lao - 2023 - scholarlypublications …
[14] Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T.,
Louf, R., Funtowicz, M., et al.: Transformers: State-of-the-art natural language processing. In …