MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Process-driven autoformalization in lean 4

J Lu, Y Wan, Z Liu, Y Huang, J Xiong, C Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Autoformalization, the conversion of natural language mathematics into formal languages,
offers significant potential for advancing mathematical reasoning. However, existing efforts …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

C Yang, C Zhao, Q Gu, D Zhou - arXiv preprint arXiv:2410.16670, 2024 - arxiv.org

Sequential reasoning in agent systems has been significantly advanced by large language
models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies …

被引用次数：1 相关文章所有 3 个版本

[PDF] openreview.net

Math for AI: On the Generalization of Learning Mathematical Problem Solving

R Zhou, M Xu, S Chen, J Liu, Y Li, X Lin, Z Chen… - The 4th Workshop on … - openreview.net

There has been a growing interest in enhancing the mathematical problem-solving (MPS)
capabilities of LLMs. While some researchers focus on developing specialized math models …

高级搜索

QQ 群