Comparison and Evaluation of Clone Detection Techniques with Different Code Representations.

ICSE(2023)

引用 0|浏览72
暂无评分
摘要
As one of bad smells in code, code clones may increase the cost of software maintenance and the risk of vulnerability propagation. In the past two decades, numerous clone detection technologies have been proposed. They can be divided into text-based, token-based, tree-based, and graph-based approaches according to their code representations. Different code representations abstract the code details from different perspectives. However, it is unclear which code representation is more effective in detecting code clones and how to combine different code representations to achieve ideal performance. In this paper, we present an empirical study to compare the clone detection ability of different code representations. Specifically, we reproduce 12 clone detection algorithms and divide them into different groups according to their code representations. After analyzing the empirical results, we find that token and tree representations can perform better than graph representation when detecting simple code clones. However, when the code complexity of a code pair increases, graph representation becomes more effective. To make our findings more practical, we perform manual analysis on open-source projects to seek a possible distribution of different clone types in the open-source community. Through the results, we observe that most clone pairs belong to simple code clones. Based on this observation, we discard heavyweight graph-based clone detection algorithms and conduct combination experiments to find out a suitable combination of token-based and tree-based approaches for achieving scalable and effective code clone detection. We develop the suitable combination into a tool called TACC and evaluate it with other state-of-the-art code clone detectors. Experimental results indicate that TACC performs better and has the ability to detect large-scale code clones.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要