An Empirical Analysis of Code Clone Authorship in Apache Projects.

IWSC(2023)

引用 0|浏览0
暂无评分
摘要
Many studies have been conducted to identify various types of code clones with a focus on accuracy, scalability, and performance. However, there has been limited exploration into the nature of code clones. Even fundamental questions, such as whether authors who write many non-clone lines also tend to write many clone lines, or whether code snippets in the same clone set were written by the same author or different authors, have not been thoroughly investigated. In this paper, we explore such fundamental questions regarding code clone authorship. We analyzed Java files from 153 Apache projects on GitHub, with a focus on line-level granularity. The analysis results showed that for 150 out of the 153 projects, the numbers of non-clone lines and clone lines contributed by each author are linearly correlated. We also found that two-thirds of the clone sets in all projects are primarily contributed to by single leading authors. These results confirm our intuitive understanding of clone characteristics, even though no previous publications have provided empirical validation data from multiple projects. Since these results could assist in designing better clone management methods, we will explore the implications of developing an effective clone management tool.
更多
查看译文
关键词
authorship,git blame,single-leader clone set,multi-leader clone set
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要