Matplotagent: Method and evaluation for llm-based agentic scientific data visualization

Z Wang, M Xia, L He, H Chen, Y Liu, R Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Chart understanding plays a pivotal role when applying Multimodal Large Language Models
(MLLMs) to real-world tasks such as analyzing scientific papers or financial reports …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Agent-as-a-Judge: Evaluate Agents with Agents

M Zhuge, C Zhao, D Ashley, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Super: Evaluating agents on setting up and executing tasks from research repositories

B Bogin, K Yang, S Gupta, K Richardson… - arXiv preprint arXiv …, 2024 - arxiv.org

Given that Large Language Models (LLMs) have made significant progress in writing code,
can they now be used to autonomously reproduce results from research repositories? Such …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

PyBench: Evaluating LLM Agent on various real-world coding tasks

Y Zhang, Y Pan, Y Wang, J Cai - arXiv preprint arXiv:2407.16732, 2024 - arxiv.org

The LLM Agent, equipped with a code interpreter, is capable of automatically solving real-
world coding tasks, such as data analysis and image editing. However, existing benchmarks …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

LAMBDA: A Large Model Based Data Agent

M Sun, R Han, B Jiang, H Qi, D Sun, Y Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free
multi-agent data analysis system that leverages the power of large models. LAMBDA is …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

A Grosnit, A Maraval, J Doran, G Paolo… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce Agent K v1. 0, an end-to-end autonomous data science agent designed to
automate, optimise, and generalise across diverse data science tasks. Fully automated …

被引用次数：1 相关文章所有 2 个版本

被引用次数：7 相关文章

高级搜索

QQ 群