S Jiang,
Y Zhang,
R Chen,
Y Jin, Z Liu - arXiv preprint arXiv:2410.15334, 2024 - arxiv.org
Direct Preference Optimization (DPO) is effective for aligning large language models (LLMs),
but when applied to multimodal models (MLLMs), it often favors text over image information …