P Jin, R Takanobu, W Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal …
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices …
Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts …
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current …
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision- centric approach. While stronger language models can enhance multimodal capabilities, the …
With the rising tide of large language models (LLMs), there has been a growing interest in developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …
The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their …
Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single …
J Li, X Wang, S Zhu, CW Kuo, L Xu, F Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve …