Analyzing the Role of Class Rebalancing Techniques in Software Defect Prediction

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING(2023)

引用 0|浏览3
暂无评分
摘要
Predicting software defects is an important task during software testing phase, especially for allocating appropriate resources and prioritizing testing tasks. Typically, classification algorithms are used to accomplish this task by using previously collected datasets. However, these datasets suffer from imbalanced label distribution where clean modules outnumber defective modules. Traditional classification algorithms cannot handle this nature in defect datasets because they assume the datasets are balanced. Failing to address this problem, the classification algorithm will produce a prediction biased towards the majority label. In the literature, there are several techniques designed to address this problem and most of them focus on data re-balancing. Recently, ensemble class imbalance techniques have emerged as an opposing approach to data rebalancing approaches. Regarding the software defect prediction, there are no studies examining the performance of ensemble class imbalance learning against data re-balancing approaches. This paper investigates the efficiency of ensemble class imbalance learning for software defect prediction. We conducted a comprehensive experiment that involved 12 datasets, six classifiers, nine class imbalance techniques, and 10 evaluation metrics. The experiments showed that ensemble approaches, particularly the Under Bagging technique, outperform traditional data re-balancing approaches, particularly when dealing with datasets that have high defect ratios.
更多
查看译文
关键词
Software defect prediction, class imbalance, machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要