Charxiv: Charting gaps in realistic chart understanding in multimodal llms

Z Wang, M Xia, L He, H Chen, Y Liu, R Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Chart understanding plays a pivotal role when applying Multimodal Large Language Models
(MLLMs) to real-world tasks such as analyzing scientific papers or financial reports …

Agent-as-a-Judge: Evaluate Agents with Agents

M Zhuge, C Zhao, D Ashley, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

Super: Evaluating agents on setting up and executing tasks from research repositories

B Bogin, K Yang, S Gupta, K Richardson… - arXiv preprint arXiv …, 2024 - arxiv.org
Given that Large Language Models (LLMs) have made significant progress in writing code,
can they now be used to autonomously reproduce results from research repositories? Such …

PyBench: Evaluating LLM Agent on various real-world coding tasks

Y Zhang, Y Pan, Y Wang, J Cai - arXiv preprint arXiv:2407.16732, 2024 - arxiv.org
The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-
world coding tasks, such as data analysis and image editing. However, existing benchmarks …

LAMBDA: A Large Model Based Data Agent

M Sun, R Han, B Jiang, H Qi, D Sun, Y Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free
multi-agent data analysis system that leverages the power of large models. LAMBDA is …

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

A Grosnit, A Maraval, J Doran, G Paolo… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Agent K v1. 0, an end-to-end autonomous data science agent designed to
automate, optimise, and generalise across diverse data science tasks. Fully automated …

Visualizing Large Language Models: A Brief Survey

AMP Brasoveanu, A Scharl, LJB Nixon… - 2024 28th …, 2024 - ieeexplore.ieee.org
This paper explores the current landscape of visualizing large language models (LLMs). The
main objective was threefold. Firstly, we investigate how we can visualize LLM-specific …

A Survey on Human-Centric LLMs

JY Wang, N Sukiennik, T Li, W Su, Q Hao, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid evolution of large language models (LLMs) and their capacity to simulate human
cognition and behavior has given rise to LLM-based frameworks and tools that are …

From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing

J Wei, C Tan, Q Chen, G Wu, S Li, Z Gao, L Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the task of text-to-diagram generation, which focuses on creating structured
visual representations directly from textual descriptions. Existing approaches in text-to …

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

C Shi, C Yang, Y Liu, B Shui, J Wang, M Jing… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded
code generation capabilities of large multimodal models (LMMs). ChartMimic utilizes …