Aligning large multimodal models with factually augmented rlhf

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

被引用次数：39 相关文章所有 2 个版本

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

被引用次数：47 相关文章所有 4 个版本

[PDF] thecvf.com

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

被引用次数：39 相关文章所有 3 个版本

[PDF] thecvf.com

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

被引用次数：36 相关文章所有 3 个版本

[PDF] thecvf.com

Dress: Instructing large vision-language models to align and interact with humans via natural language feedback

Y Chen, K Sikka, M Cogswell, H Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present DRESS a large vision language model (LVLM) that innovatively exploits Natural
Language feedback (NLF) from Large Language Models to enhance its alignment and …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

LLaVA-: Efficient Multi-Modal Assistant with Small Language Model

Y Zhu, M Zhu, N Liu, Z Ou, X Mou, J Tang - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we introduce LLaVA-$\phi $(LLaVA-Phi), an efficient multi-modal assistant that
harnesses the power of the recently advanced small language model, Phi-2, to facilitate …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Aligning modalities in vision large language models via preference fine-tuning

Y Zhou, C Cui, R Rafailov, C Finn, H Yao - arXiv preprint arXiv:2402.11411, 2024 - arxiv.org

Instruction-following Vision Large Language Models (VLLMs) have achieved significant
progress recently on a variety of tasks. These approaches merge strong pre-trained vision …

被引用次数：14 相关文章所有 3 个版本

高级搜索

QQ 群