Large language models (LLMs), such as Codex, hold great promise in enhancing programming education by automatically generating feedback for students. We investigate …
The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated …
The ability to derive underlying principles from a handful of observations and then generalize to novel situations--known as inductive reasoning--is central to human …
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …
Machine learning models are widely used, but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so …
Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have …
M Endres, S Fakhoury, S Chakraborty… - Proceedings of the ACM …, 2024 - dl.acm.org
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a program's intent …
Abstract Machine learning models are widely used but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should …
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent …