Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Advancing multimodal large language models in chart question answering with visualization-referenced instruction tuning

X Zeng, H Lin, Y Ye, W Zeng - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Emerging multimodal large language models (MLLMs) exhibit great potential for chart
question answering (CQA). Recent efforts primarily focus on scaling up training datasets (ie …

Mmsci: A multimodal multi-discipline dataset for phd-level scientific comprehension

Z Li, X Yang, K Choi, W Zhu, R Hsieh… - AI for Accelerated …, 2024 - openreview.net
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models
(LMMs) has heightened the demand for AI-based scientific assistants capable of …

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

F Zhang, L Wu, H Bai, G Lin, X Li, X Yu, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Coding tasks have been valuable for evaluating Large Language Models (LLMs), as they
demand the comprehension of high-level instructions, complex reasoning, and the …

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

Y Hao, J Gu, HW Wang, L Li, Z Yang, L Wang… - arXiv preprint arXiv …, 2025 - arxiv.org
The ability to organically reason over and with both text and images is a pillar of human
intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such …

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

X Zhao, X Luo, Q Shi, C Chen, S Wang, W Che… - arXiv preprint arXiv …, 2025 - arxiv.org
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in
chart understanding tasks. However, interpreting charts with textual descriptions often leads …

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

L Zhang, S Eger, Y Cheng, W Zhai, J Belouadi… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal large language models (LLMs) have demonstrated impressive capabilities in
generating high-quality images from textual instructions. However, their performance in …

Can LLMs Understand Time Series Anomalies?

Z Zhou, R Yu - arXiv preprint arXiv:2410.05440, 2024 - arxiv.org
Large Language Models (LLMs) have gained popularity in time series forecasting, but their
potential for anomaly detection remains largely unexplored. Our study investigates whether …

Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code

T Galimzyanov, S Titov, Y Golubev… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate
language models' effectiveness as assistants in visual data exploration. Our benchmark …