Software testing with large language models: Survey, landscape, and vision

J Wang, Y Huang, C Chen, Z Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Pre-trained large language models (LLMs) have recently emerged as a breakthrough
technology in natural language processing and artificial intelligence, with the ability to …

A critical review of large language model on software engineering: An example from chatgpt and automated program repair

Q Zhang, T Zhang, J Zhai, C Fang, B Yu, W Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have been gaining increasing attention and demonstrated
promising performance across a variety of Software Engineering (SE) tasks, such as …

A systematic literature review on large language models for automated program repair

Q Zhang, C Fang, Y Xie, YX Ma, W Sun, Y Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Automated Program Repair (APR) attempts to patch software bugs and reduce manual
debugging efforts. Very recently, with the advances in Large Language Models (LLMs), an …

Agent-as-a-Judge: Evaluate Agents with Agents

M Zhuge, C Zhao, D Ashley, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Contemporary evaluation techniques are inadequate for agentic systems. These
approaches either focus exclusively on final outcomes--ignoring the step-by-step nature of …

Cigar: Cost-efficient program repair with llms

D Hidvégi, K Etemadi, S Bobadilla… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLM) have proven to be effective at automated program repair
(APR). However, using LLMs can be highly costly, with companies invoicing users by the …

Large language models of code fail at completing code with potential bugs

T Dinh, J Zhao, S Tan, R Negrinho… - Advances in …, 2024 - proceedings.neurips.cc
Large language models of code (Code-LLMs) have recently brought tremendous advances
to code completion, a fundamental feature of programming assistance and code …

Mdeval: Massively multilingual code debugging

S Liu, L Chai, J Yang, J Shi, H Zhu, L Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Code large language models (LLMs) have made significant progress in code debugging by
directly generating the correct code based on the buggy code snippet. Programming …

Super: Evaluating agents on setting up and executing tasks from research repositories

B Bogin, K Yang, S Gupta, K Richardson… - arXiv preprint arXiv …, 2024 - arxiv.org
Given that Large Language Models (LLMs) have made significant progress in writing code,
can they now be used to autonomously reproduce results from research repositories? Such …

Better context makes better code language models: A case study on function call argument completion

H Pei, J Zhao, L Lausen, S Zha, G Karypis - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Pretrained code language models have enabled great progress towards program synthesis.
However, common approaches only consider in-file local context and thus miss information …

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

H Wang, Z Liu, S Wang, G Cui, N Ding… - Findings of the …, 2024 - aclanthology.org
This paper introduces INTERVENOR (INTERactiVE chaiN Of Repair), a system designed to
emulate the interactive code repair processes observed in humans, encompassing both …