Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)(2022)

引用 0|浏览14
暂无评分
摘要
Deep learning-based automatic program repair (DL-APR) returns a patch code when given a defect code. Recent studies on DL-APR techniques have focused on the training phase to generate more accurate patches; however, a trained model cannot always generate an accurate patch for every new defect code, as the training dataset does not completely represent the new defects to be input in the future. DL-APR researchers should study a method to elicit the best performance on new inputs from the trained and deployed model. A new defect instance (i.e., defect codes and their context codes) is one of the crucial input data that determine the accuracy of the DL-APR, which can be changed and improved. We improve the quality of new input defect instances by focusing on the presence of noise tokens which compromise the defect instances’ quality, thus impairing the accuracy of generated patches. This paper shows that 1) there are noise tokens which prevent correct patch generation (inference) in a new defect instance, and 2) it is necessary to mask these noise tokens to avoid their usage in inferencing patch codes. In order to validate these two assertions, we use a state-of-the-art DL-APR technique and a genetic algorithm to generate near-optimal defect instances which maximize the patch generation accuracy (i.e., the BLEU score) of 4,573 defect instances. Based on optimization results, we found that 1) noise tokens impair patch generation accuracy in approximately 49% of instances, and 2) if these tokens are precluded from inference by masking them, we can improve patch generation accuracy by 88%. The results suggest that future work is required to automatically remove noise tokens from new defect instances so that the trained patch generator generates better patches.
更多
查看译文
关键词
Deep learning,Automatic program repair,Optimization,Masking,Noise token
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要