While GPT-4V (ision) impressively models both visual and textual information simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucination, which may …
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics …
J Roberts, T Lüddecke, R Sheikh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial …
Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in …
MR Taesiri, T Feng, CP Bezemer… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities such as visual inputs. This integration augments the …
Scoring student-drawn models is time-consuming. Recently released GPT-4V provides a unique opportunity to advance scientific modeling practices by leveraging the powerful …
This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on three representative tasks radiology report generation, medical visual question answering …
X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional capabilities in vision-language tasks. Despite their advanced functionalities, the …