A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Pitfalls in language models for code intelligence: A taxonomy and survey

X She, Y Liu, Y Zhao, Y He, L Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Modern language models (LMs) have been successfully employed in source code
generation and understanding, leading to a significant increase in research focused on …

Towards efficient fine-tuning of language models with organizational data for automated software review

M Nashaat, J Miller - IEEE Transactions on Software …, 2024 - ieeexplore.ieee.org
Large language models like BERT and GPT possess significant capabilities and potential
impacts across various applications. Software engineers often use these models for code …

Protecting intellectual property of large language model-based code generation apis via watermarks

Z Li, C Wang, S Wang, C Gao - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org
The rise of large language model-based code generation (LLCG) has enabled various
commercial services and APIs. Training LLCG models is often expensive and time …

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

Split and merge: Aligning position biases in large language model based evaluators

Z Li, C Wang, P Ma, D Wu, S Wang, C Gao… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have shown promise as automated evaluators for assessing
the quality of answers generated by AI systems. However, these LLM-based evaluators …

A survey on large language models for software engineering

Q Zhang, C Fang, Y Xie, Y Zhang, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Software Engineering (SE) is the systematic design, development, and maintenance of
software applications, underpinning the digital infrastructure of our modern mainworld. Very …

Reef: A framework for collecting real-world vulnerabilities and fixes

C Wang, Z Li, Y Pena, S Gao, S Chen… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Software plays a crucial role in our daily lives, and therefore the quality and security of
software systems have become increasingly important. However, vulnerabilities in software …

Multistage collaborative knowledge distillation from a large language model for semi-supervised sequence generation

J Zhao, W Zhao, A Drozdov, B Rozonoyer… - Proceedings of the …, 2024 - aclanthology.org
We study semi-supervised sequence generation tasks, where the few labeled examples are
too scarce to finetune a model, and meanwhile, few-shot prompted large language models …

NoteChat: a dataset of synthetic doctor-patient conversations conditioned on clinical notes

J Wang, Z Yao, Z Yang, H Zhou, R Li, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
The detailed clinical records drafted by doctors after each patient's visit are crucial for
medical practitioners and researchers. Automating the creation of these notes with language …