Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

Process-driven autoformalization in lean 4

J Lu, Y Wan, Z Liu, Y Huang, J Xiong, C Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoformalization, the conversion of natural language mathematics into formal languages,
offers significant potential for advancing mathematical reasoning. However, existing efforts …

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

C Yang, C Zhao, Q Gu, D Zhou - arXiv preprint arXiv:2410.16670, 2024 - arxiv.org
Sequential reasoning in agent systems has been significantly advanced by large language
models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies …

Math for AI: On the Generalization of Learning Mathematical Problem Solving

R Zhou, M Xu, S Chen, J Liu, Y Li, X Lin, Z Chen… - The 4th Workshop on … - openreview.net
There has been a growing interest in enhancing the mathematical problem-solving (MPS)
capabilities of LLMs. While some researchers focus on developing specialized math models …