DS-1000: A natural and reliable benchmark for data science code generation

Y Lai, C Li, Y Wang, T Zhang, R Zhong… - International …, 2023 - proceedings.mlr.press
We introduce DS-1000, a code generation benchmark with a thousand data science
problems spanning seven Python libraries, such as Numpy and Pandas. Compared to prior …

CERT: continual pre-training on sketches for library-oriented code generation

D Zan, B Chen, D Yang, Z Lin, M Kim, B Guan… - arXiv preprint arXiv …, 2022 - arxiv.org
Code generation is a longstanding challenge, aiming to generate a code snippet based on a
natural language description. Usually, expensive text-code paired data is essential for …

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong… - arXiv preprint arXiv …, 2023 - simg.baai.ac.cn
In this work we systematically review the recent advancements in code processing with
language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 …

Execution-based evaluation for data science code generation models

J Huang, C Wang, J Zhang, C Yan, H Cui… - arXiv preprint arXiv …, 2022 - arxiv.org
Code generation models can benefit data scientists' productivity by automatically generating
code from context and text descriptions. An important measure of the modeling progress is …

Transrepair: Context-aware program repair for compilation errors

X Li, S Liu, R Feng, G Meng, X Xie, K Chen… - Proceedings of the 37th …, 2022 - dl.acm.org
Automatically fixing compilation errors can greatly raise the productivity of software
development, by guiding the novice or AI programmers to write and debug code. Recently …

Where Are Large Language Models for Code Generation on GitHub?

X Yu, L Liu, X Hu, JW Keung, J Liu, X Xia - arXiv preprint arXiv:2406.19544, 2024 - arxiv.org
The increasing use of Large Language Models (LLMs) in software development has
garnered significant attention from researchers assessing the quality of the code they …

A survey of neural code intelligence: Paradigms, advances and beyond

Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …

Contextualized Data-Wrangling Code Generation in Computational Notebooks

J Huang, D Guo, C Wang, J Gu, S Lu, JP Inala… - Proceedings of the 39th …, 2024 - dl.acm.org
Data wrangling, the process of preparing raw data for further analysis in computational
notebooks, is a crucial yet time-consuming step in data science. Code generation has the …

Non-programmers can label programs indirectly via active examples: A case study with text-to-SQL

R Zhong, C Snell, D Klein, J Eisner - Proceedings of the 2023 …, 2023 - aclanthology.org
Can non-programmers annotate natural language utterances with complex programs that
represent their meaning? We introduce APEL, a framework in which non-programmers …

Mind the Gap between the Application Track and the Real World

A Ganesh, J Cao, EM Perkoff, R Southwell… - Proceedings of the …, 2023 - aclanthology.org
Recent advances in NLP have led to a rise in inter-disciplinary and application-oriented
research. While this demonstrates the growing real-world impact of the field, research …