What Do Developer-Repaired Flaky Tests Tell Us About the Effectiveness of Automated Flaky Test Detection?

Owain Parry,Gregory M. Kapfhammer,Michael Hilton,Phil McMinn

2022 IEEE/ACM International Conference on Automation of Software Test (AST)（2022）

引用 2|浏览26

暂无评分

摘要

Because they pass or fail without code changes, flaky tests cause serious problems such as spuriously failing builds and the eroding of developers’ trust in tests. Many previous evaluations of automated flaky test detection techniques do not accurately assess their usefulness for the developers who identify the flaky tests to repair. This is because researchers evaluate detection techniques against baselines that are not derived from past developer behavior or against no baselines at all. To study the effectiveness of an automated test rerunning technique, a common baseline for other approaches to detection, this paper uses 75 commits–authored by human software developers- that repair test flakiness in 31 real-world Python projects. Surprisingly, automated rerunning detects the developer-repaired flaky tests in only 40% of the studied commits. This result suggests that automated rerunning does not often find those flaky tests that developers fix, implying that it makes an unsuitable baseline for assessing a detection technique’s usefulness for developers. CCS CONCEPTS • Software and its engineering →Software testing and debugging. ACM Reference Format: Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2022. What Do Developer-Repaired Flaky Tests Tell Us About the Effectiveness of Automated Flaky Test Detection?. In IEEE/ACM 3rd International Conference on Automation of Software Test (AST ’22), May 17-18, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3524481.3527227

查看译文

关键词

Software Testing,Flaky Tests,Automated Detection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要