Evaluating instruction-tuned large language models on code comprehension and generation

Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou - arXiv preprint arXiv:2308.01240, 2023 - arxiv.org
In this work, we evaluate 10 open-source instructed LLMs on four representative code
comprehension and generation tasks. We have the following main findings. First, for the zero …

An empirical study on fine-tuning large language models of code for automated program repair

K Huang, X Meng, J Zhang, Y Liu… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
The advent of large language models (LLMs) has opened up new opportunities for
automated program repair (APR). In particular, some recent studies have explored how to …

What do code models memorize? an empirical study on large language models of code

Z Yang, Z Zhao, C Wang, J Shi, D Kim, DG Han… - arXiv preprint arXiv …, 2023 - arxiv.org
The availability of large-scale datasets, advanced architectures, and powerful computational
resources have led to effective code models that automate diverse software engineering …

[PDF][PDF] Unifying the perspectives of nlp and software engineering: A survey on language models for code

Z Zhang, C Chen, B Liu, C Liao, Z Gong… - arXiv preprint arXiv …, 2023 - simg.baai.ac.cn
In this work we systematically review the recent advancements in code processing with
language models, covering 50+ models, 30+ evaluation tasks, 170+ datasets, and 700 …

SMT solver validation empowered by large pre-trained language models

M Sun, Y Yang, Y Wang, M Wen, H Jia… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
SMT solvers are utilized to check the satisfiability of logic formulas and have been applied in
various crucial domains, including software verification, test case generation, and program …

A survey on large language models for software engineering

Q Zhang, C Fang, Y Xie, Y Zhang, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, and maintenance of
software applications, underpinning the digital infrastructure of our modern mainworld. Very …

Robustness, security, privacy, explainability, efficiency, and usability of large language models for code

Z Yang, Z Sun, TZ Yue, P Devanbu, D Lo - arXiv preprint arXiv:2403.07506, 2024 - arxiv.org
Large language models for code (LLM4Code), which demonstrate strong performance (eg,
high accuracy) in processing source code, have significantly transformed software …

The devil is in the tails: How long-tailed code distributions impact large language models

X Zhou, K Kim, B Xu, J Liu, DG Han, D Lo - arXiv preprint arXiv …, 2023 - arxiv.org
Learning-based techniques, especially advanced Large Language Models (LLMs) for code,
have gained considerable popularity in various software engineering (SE) tasks. However …

Codegen4libs: A two-stage approach for library-oriented code generation

M Liu, T Yang, Y Lou, X Du, Y Wang… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Automated code generation has been extensively studied in recent literature. In this work,
we first survey 66 participants to motivate a more pragmatic code generation scenario, ie …

Automatically Recommend Code Updates: Are We There Yet?

Y Liu, C Tantithamthavorn, Y Liu… - ACM Transactions on …, 2024 - dl.acm.org
In recent years, large pre-trained Language Models of Code (CodeLMs) have shown
promising results on various software engineering tasks. One such task is automatic code …