A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …

A systematic literature review on explainability for machine/deep learning-based software engineering research

S Cao, X Sun, R Widyasari, D Lo, X Wu, L Bo… - arXiv preprint arXiv …, 2024 - arxiv.org
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in
Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment …

Robustness, security, privacy, explainability, efficiency, and usability of large language models for code

Z Yang, Z Sun, TZ Yue, P Devanbu, D Lo - arXiv preprint arXiv:2403.07506, 2024 - arxiv.org
Large language models for code (LLM4Code), which demonstrate strong performance (eg,
high accuracy) in processing source code, have significantly transformed software …

Toward a theory of causation for interpreting neural code models

DN Palacio, A Velasco, N Cooper… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing
from research prototypes to commercial developer tools. As such, understanding the …

Causal Evaluation of Language Models

S Chen, B Peng, M Chen, R Wang, M Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Causal reasoning is viewed as crucial for achieving human-level machine intelligence.
Recent advances in language models have expanded the horizons of artificial intelligence …

Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations

DN Palacio, D Rodriguez-Cardenas, A Velasco… - arXiv preprint arXiv …, 2024 - arxiv.org
Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more
interpretable an LLM is, the more trustworthy it becomes. However, current techniques for …

Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code?

A Velasco, DN Palacio… - Proceedings of the …, 2024 - dl.acm.org
This paper discusses the limitations of evaluating Masked Language Models (MLMs) in
code completion tasks. We highlight that relying on accuracy-based measurements may …

Knowledge-based Consistency Testing of Large Language Models

SS Rajan, E Soremekun, S Chattopadhyay - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we systematically expose and measure the inconsistency and knowledge gaps
of Large Language Models (LLMs). Specifically, we propose an automated testing …

Beyond Accuracy and Robustness Metrics for Large Language Models for Code

D Rodriguez-Cardenas - Proceedings of the 2024 IEEE/ACM 46th …, 2024 - dl.acm.org
In recent years, Large Language Models for code (LLMc) have transformed the landscape of
software engineering (SE), demonstrating significant efficacy in tasks such as code …

Code Syntax Understanding in Large Language Models

C Granger - 2024 - scholarworks.wm.edu
In recent years, tasks for automated software engineering have been achieved using Large
Language Models trained on source code, such as Seq2Seq, LSTM, GPT, T5, BART and …