Pitfalls in language models for code intelligence: A taxonomy and survey

X She, Y Liu, Y Zhao, Y He, L Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Modern language models (LMs) have been successfully employed in source code
generation and understanding, leading to a significant increase in research focused on …

Generative type inference for python

Y Peng, C Wang, W Wang, C Gao… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Python is a popular dynamic programming language, evidenced by its ranking as the
second most commonly used language on GitHub. However, its dynamic type system can …

When less is enough: Positive and unlabeled learning model for vulnerability detection

XC Wen, X Wang, C Gao, S Wang… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Automated code vulnerability detection has gained increasing attention in recent years. The
deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have …

A Catalog of Data Smells for Coding Tasks

A Vitale, R Oliveto, S Scalabrino - ACM Transactions on Software …, 2024 - dl.acm.org
Large Language Models (LLMs) are increasingly becoming fundamental in supporting
software developers in coding tasks. The massive datasets used for training LLMs are often …

Are we ready to embrace generative AI for software Q&A?

B Xu, TD Nguyen, T Le-Cong, T Hoang… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Stack Overflow, the world's largest software Q&A (SQA) website, is facing a significant traffic
drop due to the emergence of generative AI techniques. ChatGPT is banned by Stack …

Causal Evaluation of Language Models

S Chen, B Peng, M Chen, R Wang, M Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Causal reasoning is viewed as crucial for achieving human-level machine intelligence.
Recent advances in language models have expanded the horizons of artificial intelligence …

Codeart: Better code models by attention regularization when symbols are lacking

Z Su, X Xu, Z Huang, Z Zhang, Y Ye, J Huang… - Proceedings of the …, 2024 - dl.acm.org
Transformer based code models have impressive performance in many software
engineering tasks. However, their effectiveness degrades when symbols are missing or not …

Improving Source Code Pre-training via Type-Specific Masking

W Zou, Q Li, C Li, J Ge, X Chen, LG Huang… - ACM Transactions on …, 2024 - dl.acm.org
The masked language modeling (MLM) task is widely recognized as one of the most
effective pre-training tasks and currently derives many variants in the software engineering …

[HTML][HTML] Causal reasoning in Software Quality Assurance: A systematic review

L Giamattei, A Guerriero, R Pietrantuono… - Information and Software …, 2024 - Elsevier
Abstract Context: Software Quality Assurance (SQA) is a fundamental part of software
engineering to ensure stakeholders that software products work as expected after release in …

Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training

Y Wang, Y Chen, Y Zhao, Z Gong, J Chen… - Proceedings of the 39th …, 2024 - dl.acm.org
Deep code models (DCMs) have achieved impressive accomplishments and have been
widely applied to various code-related tasks. However, existing studies show that some …