作者
Olfat Mirza, Mike Joy
发表日期
2015
研讨会论文
Plagiarism across Europe and Beyond
页码范围
53–61
简介
The enormous growth in the available online code resources has created new challenges for detecting plagiarism in source code of programs. Several software applications can detect source code similarity using different detection methods. However, few current detection tools detect every kind of detection plagiarism attack. The aim of this thesis is, therefore, to enhance methods for plagiarism detection in source code using a style analysis approach that has been used to detect authorship. There are very few large source-code datasets which are suitable for research purposes, and two such datasets include the BlackBox dataset and the SOCO (Detection of SOurce COde) dataset. SOCO is a benchmark dataset that contains groups of similar source-code files that can be considered plagiarised and has been used in authorship and plagiarism detection competitions. In the first part of the thesis, the suitability of BlackBox as source of datasets for testing plagiarism detection is explored. The files in BlackBox were analysed and visualised in order to evaluate its suitability as a dataset that can be used in this research. The analysis aimed to identify similar source code files, and therefore to detect groups of Java files within BlackBox that can be used for evaluating the performance of source-code plagiarism detection methods. In the second part of the thesis, a plagiarism detection framework (\the Metric-File Matrix Framework (MFM)" is proposed. The MFM framework is designed to overcome some of the limitations of existing plagiarism detection methods by 1) proposing a new set of metrics which consider structural and stylistic similarities; and 2 …
引用总数
202020212022202320241111