相关文章- 学术资源搜索

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

被引用次数：50 相关文章所有 3 个版本

[PDF] thecvf.com

SEED-Bench: Benchmarking Multimodal Large Language Models

B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) building upon the foundation of powerful large
language models (LLMs) have recently demonstrated exceptional capabilities in generating …

被引用次数：30 相关文章所有 3 个版本

[PDF] thecvf.com

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Q Huang, X Dong, P Zhang, B Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Hallucination posed as a pervasive challenge of multi-modal large language models
(MLLMs) has significantly impeded their real-world usage that demands precise judgment …

被引用次数：43 相关文章所有 4 个版本

[PDF] arxiv.org

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Aligning large multimodal models with factually augmented rlhf

Z Sun, S Shen, S Cao, H Liu, C Li, Y Shen… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Multimodal Models (LMM) are built across modalities and the misalignment between
two modalities can result in" hallucination", generating textual outputs that are not grounded …

被引用次数：108 相关文章所有 4 个版本

[PDF] arxiv.org

Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models

Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao… - arXiv preprint arXiv …, 2023 - arxiv.org

We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint
mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision …

被引用次数：110 相关文章所有 9 个版本

[PDF] thecvf.com

Hallucination augmented contrastive learning for multimodal large language model

C Jiang, H Xu, M Dong, J Chen, W Ye… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multi-modal large language models (MLLMs) have been shown to efficiently integrate
natural language with visual information to handle multi-modal tasks. However MLLMs still …

被引用次数：22 相关文章所有 3 个版本

[PDF] thecvf.com

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

被引用次数：146 相关文章所有 4 个版本

[PDF] arxiv.org

Aligning large multi-modal model with robust instruction tuning

F Liu, K Lin, L Li, J Wang, Y Yacoob, L Wang - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the promising progress in multi-modal tasks, current large multi-modal models
(LMM) are prone to hallucinating inconsistent descriptions with respect to the associated …

被引用次数：142 相关文章

Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences

X Wang, Y Zhou, X Liu, H Lu, Y Xu, F He… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a
variety of visual-language tasks. However, current MLLM benchmarks are predominantly …

被引用次数：27 相关文章所有 2 个版本

高级搜索

QQ 群

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

SEED-Bench: Benchmarking Multimodal Large Language Models

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

Aligning large multimodal models with factually augmented rlhf

Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models

Hallucination augmented contrastive learning for multimodal large language model

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Aligning large multi-modal model with robust instruction tuning

Mementos: A comprehensive benchmark for multimodal large language model reasoning over image sequences

引用