Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

“What it wants me to say”: Bridging the abstraction gap between end-user programmers and code-generating large language models

MX Liu, A Sarkar, C Negreanu, B Zorn… - Proceedings of the …, 2023 - dl.acm.org
Code-generating large language models map natural language to code. However, only a
small portion of the infinite space of naturalistic utterances is effective at guiding code …

MultiPL-E: a scalable and polyglot approach to benchmarking neural code generation

F Cassano, J Gouwar, D Nguyen… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Large language models have demonstrated the ability to generate both natural language
and programming language text. Although contemporary code generation models are …

Large language models meet nl2code: A survey

D Zan, B Chen, F Zhang, D Lu, B Wu, B Guan… - arXiv preprint arXiv …, 2022 - arxiv.org
The task of generating code from a natural language description, or NL2Code, is considered
a pressing and significant challenge in code intelligence. Thanks to the rapid development …

Multi-lingual evaluation of code generation models

B Athiwaratkun, SK Gouda, Z Wang, X Li, Y Tian… - arXiv preprint arXiv …, 2022 - arxiv.org
We present new benchmarks on evaluation code generation models: MBXP and Multilingual
HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are …

Deep learning based code generation methods: A literature review

Z Yang, S Chen, C Gao, Z Li, G Li, R Lv - arXiv preprint arXiv:2303.01056, 2023 - arxiv.org
Code Generation aims at generating relevant code fragments according to given natural
language descriptions. In the process of software development, there exist a large number of …

Execution-based evaluation for open-domain code generation

Z Wang, S Zhou, D Fried, G Neubig - arXiv preprint arXiv:2212.10481, 2022 - arxiv.org
To extend the scope of coding queries to more realistic settings, we propose ODEX, the first
Open-Domain EXecution-based natural language (NL) to Python code generation dataset …

xcodeeval: A large scale multilingual multitask benchmark for code understanding, generation, translation and retrieval

MAM Khan, MS Bari, XL Do, W Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, pre-trained large language models (LLMs) have shown impressive abilities in
generating codes from natural language descriptions, repairing buggy codes, translating …

Ernie-code: Beyond english-centric cross-lingual pretraining for programming languages

Y Chai, S Wang, C Pang, Y Sun, H Tian… - arXiv preprint arXiv …, 2022 - arxiv.org
Software engineers working with the same programming language (PL) may speak different
natural languages (NLs) and vice versa, erecting huge barriers to communication and …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …