作者
Hitesh Sajnani, Vaibhav Saini, Cristina Lopes
发表日期
2015/6
期刊
Journal of Software: Evolution and Process
卷号
27
期号
6
页码范围
402-429
简介
We propose a new token‐based approach for large ‐scale code clone detection, which is based on a filtering heuristic that reduces the number of token comparisons when the two code blocks are compared. We also present a MapReduce based parallel algorithm that uses the filtering heuristic and scales to thousands of projects. The filtering heuristic is generic and can also be used in conjunction with other token‐based approaches. In that context, we demonstrate how it can increase the retrieval speed and decrease the memory usage of the index‐based approaches. In our experiments on 36 open source Java projects, we found that: (i) filtering reduces token comparisons by a factor of 10, and thus increasing the speed of clone detection by a factor of 1.5; (ii) the speed‐up and scale‐up of the parallel approach using filtering is near‐linear on a cluster of 2–32 nodes for 150–2800 projects; and (iii) filtering …
引用总数
20142015201620172018201920202021202220232024365512277321
学术搜索中的文章
H Sajnani, V Saini, C Lopes - Journal of Software: Evolution and Process, 2015