RobustNPR: Evaluating the robustness of neural program repair models

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS(2024)

引用 0|浏览40
暂无评分
摘要
Due to the high cost of repairing defective programs, many researches focus on automatic program repair (APR). In recent years, the new trend of APR is to apply neural networks to mine the relations between defective programs and corresponding patches automatically, which is known as neural program repair (NPR). The community, however, ignores some important properties that could impact the applicability of NPR systems, such as robustness. For semantic-identical buggy programs, NPR systems may produce totally different patches. In this paper, we propose an evaluation tool named RobustNPR, the first NPR robustness evaluation tool. RobustNPR employs several mutators to generate semantic-identical mutants of defective programs. For an original defective program and its mutant, it checks two aspects of NPR: (a) Can NPR fix mutants when it can fix the original defective program? and (b) can NPR generate semantic-identical patches for the original program and the mutant? Then, we evaluate four SOTA NPR models and analyze the results. From the results, we find that even for the best-performing model, 20.16% of the repair success is unreliable, which indicates that the robustness of NPR is not perfect. In addition, we find that the robustness of NPR is correlated with model settings and other factors.
更多
查看译文
关键词
model evaluation,neural program repair,robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要