Codereval: A benchmark of pragmatic code generation with generative pre-trained models

F Quin, D Weyns, M Galster, CC Silva - Journal of Systems and Software, 2024 - Elsevier

A/B testing, also referred to as online controlled experimentation or continuous
experimentation, is a form of hypothesis testing where two variants of a piece of software are …

被引用次数：189 相关文章所有 6 个版本

[PDF] arxiv.org

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

被引用次数：76 相关文章所有 4 个版本

[PDF] arxiv.org

Pangu-coder2: Boosting large language models for code with ranking feedback

B Shen, J Zhang, T Chen, D Zan, B Geng, A Fu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models for Code (Code LLM) are flourishing. New and powerful models
are released on a weekly basis, demonstrating remarkable performance on the code …

被引用次数：44 相关文章所有 2 个版本

[PDF] arxiv.org

Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we make the first attempt to evaluate LLMs in a more challenging code
generation scenario, ie class-level code generation. We first manually construct the first …

被引用次数：42 相关文章所有 3 个版本

[PDF] github.io

Evaluating large language models in class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - Proceedings of the …, 2024 - dl.acm.org

Recently, many large language models (LLMs) have been proposed, showing advanced
proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Verilogeval: Evaluating large language models for verilog code generation

M Liu, N Pinckney, B Khailany… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org

The increasing popularity of large language models (LLMs) has paved the way for their
application in diverse domains. This paper proposes a benchmarking framework tailored …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Towards generating functionally correct code edits from natural language issue descriptions

S Fakhoury, S Chakraborty, M Musuvathi… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential
to generate code from natural language descriptions across a wide range of programming …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - arXiv preprint arXiv:2404.12736, 2024 - arxiv.org

The rapid advancements in pre-trained Large Language Models (LLMs) and Large
Multimodal Models (LMMs) have ushered in a new era of intelligent applications …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring and evaluating hallucinations in llm-powered code generation

F Liu, Y Liu, L Shi, H Huang, R Wang, Z Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of Large Language Models (LLMs) has significantly advanced many applications
on software engineering tasks, particularly in code generation. Despite the promising …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org

General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …

被引用次数：22 相关文章所有 2 个版本

高级搜索

QQ 群