Software defect prediction employing BiLSTM and BERT-based semantic feature

Soft Computing(2022)

引用 16|浏览8
暂无评分
摘要
Recent years, software defect prediction systems are becoming quite popular since they improve software reliability by identifying the potential bugs in the code. Several models were introduced in literature that aim to support the developers. Unfortunately, these models consider the manually constructed code features and input into machine learning-based classifiers. Moreover, these baseline approaches ignore the semantic and contextual information of the source code. With this paper we present a software defect prediction model that address all these issues. The model employs bidirectional long-short term memory network (BiLSTM) and BERT-based semantic feature (SDP-BB) that captures the semantic features of code to predict defects in the corresponding software. In particular, it utilizes the BiLSTM to exploit contextual information from the embedded token vectors learned through BERT model. Moreover, it utilizes an attention mechanism to capture salient features of the nodes. This is done through a data augmentation technique for generating more training data. We evaluated our approach against state-of-the-art models using ten open-source projects in terms of F1-score in fault prediction. The experiments evaluated the performance of full-token and AST-node data processing methods conducting the length of coverage on each project from 50 to 90% in both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP) experiments. The results indicate that the proposed method outperforms competing models.
更多
查看译文
关键词
Software reliability,Software defect prediction,BERT,BiLSTM,Attention mechanism,Data augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要