Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

SEED-Bench: Benchmarking Multimodal Large Language Models

B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) building upon the foundation of powerful large
language models (LLMs) have recently demonstrated exceptional capabilities in generating …

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Q Huang, X Dong, P Zhang, B Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Hallucination posed as a pervasive challenge of multi-modal large language models
(MLLMs) has significantly impeded their real-world usage that demands precise judgment …

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

Aligning large multimodal models with factually augmented rlhf

Z Sun, S Shen, S Cao, H Liu, C Li, Y Shen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Multimodal Models (LMM) are built across modalities and the misalignment between
two modalities can result in" hallucination", generating textual outputs that are not grounded …

Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models

Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao… - arXiv preprint arXiv …, 2023 - arxiv.org
We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint
mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision …

Hallucination augmented contrastive learning for multimodal large language model

C Jiang, H Xu, M Dong, J Chen, W Ye… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multi-modal large language models (MLLMs) have been shown to efficiently integrate
natural language with visual information to handle multi-modal tasks. However MLLMs still …

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

Aligning large multi-modal model with robust instruction tuning

F Liu, K Lin, L Li, J Wang, Y Yacoob, L Wang - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the promising progress in multi-modal tasks, current large multi-modal models
(LMM) are prone to hallucinating inconsistent descriptions with respect to the associated …

Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences

X Wang, Y Zhou, X Liu, H Lu, Y Xu, F He… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a
variety of visual-language tasks. However, current MLLM benchmarks are predominantly …