Plausible may not be faithful: Probing object hallucination in vision-language pre-training

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

被引用次数：2078 相关文章所有 7 个版本

[PDF] arxiv.org

A survey of hallucination in large foundation models

V Rawte, A Sheth, A Das - arXiv preprint arXiv:2309.05922, 2023 - arxiv.org

Hallucination in a foundation model (FM) refers to the generation of content that strays from
factual reality or includes fabricated information. This survey paper provides an extensive …

被引用次数：177 相关文章所有 2 个版本

[PDF] arxiv.org

Evaluating object hallucination in large vision-language models

Y Li, Y Du, K Zhou, J Wang, WX Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …

被引用次数：338 相关文章所有 6 个版本

[PDF] arxiv.org

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

被引用次数：64 相关文章所有 4 个版本

[PDF] thecvf.com

Dress: Instructing large vision-language models to align and interact with humans via natural language feedback

Y Chen, K Sikka, M Cogswell, H Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present DRESS a large vision language model (LVLM) that innovatively exploits Natural
Language feedback (NLF) from Large Language Models to enhance its alignment and …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

被引用次数：46 相关文章所有 2 个版本

[PDF] aaai.org

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large
Language Models (LLMs) using an assortment of annotated downstream vision-language …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org

Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of reasoning with foundation models

J Sun, C Zheng, E Xie, Z Liu, R Chu, J Qiu, J Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群