Predicting Defective Lines Using a Model-Agnostic Technique

IEEE Transactions on Software Engineering(2022)

引用 83|浏览51
暂无评分
摘要
Defect prediction models are proposed to help a team prioritize the areas of source code files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole file while only a small fraction of its source code lines are defective. Indeed, we find that as little as 1-3 percent of lines of a file are defective. Hence, in this work, we propose a novel framework (called Line-DP ) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our Line-DP first builds a file-level defect model using code token features. Then, our Line-DP uses a state-of-the-art model-agnostic technique (i.e., LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our Line-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of 0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our Line-DP requires an average computation time of 10 seconds including model construction and defective line identification time. In addition, we find that 63 percent of defective lines that can be identified by our Line-DP are related to common defects (e.g., argument change, condition change). These results suggest that our Line-DP can effectively identify defective lines that contain common defects while requiring a smaller amount of inspection effort and a manageable computation cost. The contribution of this paper builds an important step towards line-level defect prediction by leveraging a model-agnostic technique.
更多
查看译文
关键词
Software quality assurance,line-level defect prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要