[HTML][HTML] A/B testing: a systematic literature review

F Quin, D Weyns, M Galster, CC Silva - Journal of Systems and Software, 2024 - Elsevier
A/B testing, also referred to as online controlled experimentation or continuous
experimentation, is a form of hypothesis testing where two variants of a piece of software are …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

Pangu-coder2: Boosting large language models for code with ranking feedback

B Shen, J Zhang, T Chen, D Zan, B Geng, A Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models for Code (Code LLM) are flourishing. New and powerful models
are released on a weekly basis, demonstrating remarkable performance on the code …

Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we make the first attempt to evaluate LLMs in a more challenging code
generation scenario, ie class-level code generation. We first manually construct the first …

Evaluating large language models in class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - Proceedings of the …, 2024 - dl.acm.org
Recently, many large language models (LLMs) have been proposed, showing advanced
proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating …

Verilogeval: Evaluating large language models for verilog code generation

M Liu, N Pinckney, B Khailany… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
The increasing popularity of large language models (LLMs) has paved the way for their
application in diverse domains. This paper proposes a benchmarking framework tailored …

Towards generating functionally correct code edits from natural language issue descriptions

S Fakhoury, S Chakraborty, M Musuvathi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential
to generate code from natural language descriptions across a wide range of programming …

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - arXiv preprint arXiv:2404.12736, 2024 - arxiv.org
The rapid advancements in pre-trained Large Language Models (LLMs) and Large
Multimodal Models (LMMs) have ushered in a new era of intelligent applications …

Exploring and evaluating hallucinations in llm-powered code generation

F Liu, Y Liu, L Shi, H Huang, R Wang, Z Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of Large Language Models (LLMs) has significantly advanced many applications
on software engineering tasks, particularly in code generation. Despite the promising …

A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …