Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark

D Chen, R Chen, S Zhang, Y Liu, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have gained significant attention recently,
showing remarkable potential in artificial general intelligence. However, assessing the utility …

Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges

C Cui, Y Zhou, X Yang, S Wu, L Zhang, J Zou… - arXiv preprint arXiv …, 2023 - arxiv.org
While GPT-4V (ision) impressively models both visual and textual information
simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …

An llm-free multi-dimensional benchmark for mllms hallucination evaluation

J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite making significant progress in multi-modal tasks, current Multi-modal Large
Language Models (MLLMs) encounter the significant challenge of hallucination, which may …

Progress and opportunities of foundation models in bioinformatics

Q Li, Z Hu, Y Wang, L Li, Y Fan, I King… - Briefings in …, 2024 - academic.oup.com
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly
through foundation models (FMs), which address longstanding challenges in bioinformatics …

Charting new territories: Exploring the geographic and geospatial capabilities of multimodal llms

J Roberts, T Lüddecke, R Sheikh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have shown remarkable capabilities across a
broad range of tasks but their knowledge and abilities in the geographic and geospatial …

Gpt-4v (ision) as a social media analysis engine

H Lyu, J Huang, D Zhang, Y Yu, X Mou, J Pan… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …

GlitchBench: Can large multimodal models detect video game glitches?

MR Taesiri, T Feng, CP Bezemer… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large multimodal models (LMMs) have evolved from large language models (LLMs) to
integrate multiple input modalities such as visual inputs. This integration augments the …

Nerif: Gpt-4v for automatic scoring of drawn models

GG Lee, X Zhai - arXiv preprint arXiv:2311.12990, 2023 - arxiv.org
Scoring student-drawn models is time-consuming. Recently released GPT-4V provides a
unique opportunity to advance scientific modeling practices by leveraging the powerful …

[HTML][HTML] A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis

Y Liu, Y Li, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - Meta-Radiology, 2024 - Elsevier
This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on
three representative tasks radiology report generation, medical visual question answering …

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …