H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this paper we present the first systematic study to investigate the design …
The exponential growth of large language models (LLMs) has opened up numerous possibilities for multi-modal AGI systems. However the progress in vision and vision …
Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in" hallucination", generating textual outputs that are not grounded …
J Cha, W Kang, J Mun, B Roh - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract In Multimodal Large Language Models (MLLMs) a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs enabling profound visual …
We present Emu, a multimodal foundation model that seamlessly generates images and text in multimodal context. This omnivore model can take in any single-modality or multimodal …
Medical images and radiology reports are essential for physicians to diagnose medical conditions. However, the vast diversity and cross-source heterogeneity inherent in these …
Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a …
F Ma, Y Zhou, Y Zhang, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Inspired by the remarkable progress achieved by recent Large Language Models (LLMs) Multimodal Large Language Models (MLLMs) take LLMs as their brains and have achieved …
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks. The prevailing …