On Inter-dataset Code Duplication and Data Leakage in Large Language Models

JAH López, B Chen, T Sharma, D Varró - arXiv preprint arXiv:2401.07930, 2024 - arxiv.org
Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse
software engineering (SE) tasks. Handling such tasks typically involves acquiring …

COMET: Generating Commit Messages using Delta Graph Context Representation

AR Mandli, S Rajput, T Sharma - arXiv preprint arXiv:2402.01841, 2024 - arxiv.org
Commit messages explain code changes in a commit and facilitate collaboration among
developers. Several commit message generation approaches have been proposed; …

Machine Learning Techniques for Python Source Code Vulnerability Detection

T Farasat, J Posegga - arXiv preprint arXiv:2404.09537, 2024 - arxiv.org
Software vulnerabilities are a fundamental reason for the prevalence of cyber attacks and
their identification is a crucial yet challenging problem in cyber security. In this paper, we …

[PDF][PDF] On the Path to Buffer Overflow Detection by Model Checking the Stack of Binary Programs.

L Ferreirinha, I Medeiros - ENASE, 2024 - scitepress.org
The C programming language, prevalent in Cyber-Physical Systems, is crucial for system
control where reliability is critical. However, it is notably susceptible to vulnerabilities …