Mhpp: Exploring the capabilities and limitations of language models beyond basic code generation

J Dai, J Lu, Y Feng, D Huang, G Zeng, R Ruan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have greatly improved code
generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0 …

Autopsv: Automated process-supervised verifier

J Lu, Z Dou, W Hongru, Z Cao, J Dai… - The Thirty-eighth …, 2024 - openreview.net
In this work, we propose a novel method named\textbf {Auto} mated\textbf {P} rocess-\textbf
{S} upervised\textbf {V} erifier (\textbf {\textsc {AutoPSV}}) to enhance the reasoning …

ACECode: A Reinforcement Learning Framework for Aligning Code Efficiency and Correctness in Code Language Models

C Yang, HJ Kang, J Shi, D Lo - arXiv preprint arXiv:2412.17264, 2024 - arxiv.org
CodeLLMs have demonstrated remarkable advancements in software engineering tasks.
However, while these models can generate functionally correct code, they often produce …

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

YAO Yuxuan, H Wu, Z Guo, Z Biyan, J Gao… - First Conference on … - openreview.net
Large language models (LLMs) have demonstrated outstanding performance across various
tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic …