On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction

Peixin Yang,Lin Zhu, Yanjiao Zhang, Chuanxiang Ma,Liming Liu,Xiao Yu,Wenhua Hu

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览0
暂无评分
摘要
Unsupervised Effort-Aware Defect Prediction (EADP) uses unlabeled data to construct a model and ranks software modules according to the software feature values. Xu et al. (JSS 2021) conducted an exploration of clustering techniques for unsupervised defect prediction and found that several clustering methods exhibit better performance on the F1@20% effort-aware metric. However, their conclusion may not be convincing, as they did not take into account the impact of the Initial False Alarms (IFA) metric on unsupervised EADP. Furthermore, their study did not compare with the state -of -the -art supervised EADP models. To further investigate clustering techniques for unsupervised EADP more comprehensively, we explore the performance of 22 clustering techniques for unsupervised EADP using three classification metrics and six effort-aware metrics. The experimental results demonstrate that (1) the best clustering technique for unsupervised EADP, K-medoids, can significantly reduce the IFA of the ManualUp method to an acceptable range. In contrast, the clustering techniques recommended by Xu et al. exhibit a high IFA value that cannot be deemed acceptable by testing teams; (2) K-medoids performs better than some supervised EADP methods, especially on metrics such as IFA and PMI@20% (Proportion of Modules Inspected when inspecting the top 20% lines of code); (3) better classification performance of clustering techniques could lead to better effort-aware performance. In summary, we recommend using the K-medoids clustering technique for unsupervised EADP and suggest that future research devote more effort to exploring better-unsupervised clustering techniques. In support of reproducibility and future research, we provide the source code used in our study (https://github.com/AndreYang816/Clustering4UEADP).
更多
查看译文
关键词
Software defect prediction,Effort-aware,Clustering technique,Unsupervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要