Large language models for software engineering: A systematic literature review

X Hou, Y Zhao, Y Liu, Z Yang, K Wang, L Li… - ACM Transactions on …, 2023 - dl.acm.org
Large Language Models (LLMs) have significantly impacted numerous domains, including
Software Engineering (SE). Many recent publications have explored LLMs applied to …

Software testing with large language models: Survey, landscape, and vision

J Wang, Y Huang, C Chen, Z Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Pre-trained large language models (LLMs) have recently emerged as a breakthrough
technology in natural language processing and artificial intelligence, with the ability to …

Codet5+: Open code large language models for code understanding and generation

Y Wang, H Le, AD Gotmare, NDQ Bui, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) pretrained on vast source code have achieved prominent
progress in code intelligence. However, existing code LLMs have two main limitations in …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Unixcoder: Unified cross-modal pre-training for code representation

D Guo, S Lu, N Duan, Y Wang, M Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Pre-trained models for programming languages have recently demonstrated great success
on code intelligence. To support both code-related understanding and generation tasks …

Impact of code language models on automated program repair

N Jiang, K Liu, T Lutellier, L Tan - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Automated program repair (APR) aims to help developers improve software reliability by
generating patches for buggy programs. Although many code language models (CLM) are …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Text and code embeddings by contrastive pre-training

A Neelakantan, T Xu, R Puri, A Radford, JM Han… - arXiv preprint arXiv …, 2022 - arxiv.org
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …

Unified pre-training for program understanding and generation

WU Ahmad, S Chakraborty, B Ray… - arXiv preprint arXiv …, 2021 - arxiv.org
Code summarization and generation empower conversion between programming language
(PL) and natural language (NL), while code translation avails the migration of legacy code …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arXiv preprint arXiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …