Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arXiv preprint arXiv …, 2024 - arxiv.org
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …

Building Math Agents with Multi-Turn Iterative Preference Learning

W Xiong, C Shi, J Shen, A Rosenberg, Z Qin… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Y Zhang, L Chen, G Zheng, Y Gao, R Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of Vision Language Models (VLMs) has brought unprecedented advances
in understanding multimodal information. The combination of textual and visual semantics in …

mDPO: Conditional Preference Optimization for Multimodal Large Language Models

F Wang, W Zhou, JY Huang, N Xu, S Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Direct preference optimization (DPO) has shown to be an effective method for large
language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal …

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

K Wu, B Jiang, Z Jiang, Q He, D Luo, S Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) contribute a powerful mechanism to
understanding visual information building on large language models. However, MLLMs are …

LLaVA-Critic: Learning to Evaluate Multimodal Models

T Xiong, X Wang, D Guo, Q Ye, H Fan, Q Gu… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as
a generalist evaluator to assess performance across a wide range of multimodal tasks …

[PDF][PDF] Assessing Audio Hallucination in Large Multimodal Models

S Hanamaki, N Kirishima, S Narumi - osf.io
Speech recognition systems have become increasingly integral in various applications, from
virtual assistants to automated transcription services, necessitating the development of …