Parrot: Multilingual Visual Instruction Tuning

HL Sun, DW Zhou, Y Li, S Lu, C Yi, QG Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has
marked a significant step towards artificial general intelligence. Existing methods mainly …

Dense Connector for MLLMs

H Yao, W Wu, T Yang, YX Song, M Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Do we fully leverage the potential of visual encoder in Multimodal Large Language Models
(MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has …

TroL: Traversal of Layers for Large Language and Vision Models

BK Lee, S Chung, CW Kim, B Park, YM Ro - arXiv preprint arXiv …, 2024 - arxiv.org
Large language and vision models (LLVMs) have been driven by the generalization power
of large language models (LLMs) and the advent of visual instruction tuning. Along with …

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

J Kim, H Kim, Y Kim, YM Ro - arXiv preprint arXiv:2406.01920, 2024 - arxiv.org
Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual
context understanding and coherent response generation. However, alongside these …