CloudRaid: Detecting Distributed Concurrency Bugs via Log Mining and Enhancement
IEEE Transactions on Software Engineering(2022)
摘要
Cloud systems suffer from distributed concurrency bugs, which often lead to data loss and service outage. This paper presents
CloudRaid
, a new automatical tool for finding distributed concurrency bugs efficiently and effectively. Distributed concurrency bugs are notoriously difficult to find as they are triggered by untimely interaction among nodes, i.e., unexpected message orderings. To detect concurrency bugs in cloud systems efficiently and effectively,
CloudRaid
analyzes and tests automatically only the message orderings that are likely to expose errors. Specifically,
CloudRaid
mines the logs from previous executions to uncover the message orderings that are feasible but inadequately tested. In addition, we also propose a log enhancing technique to introduce new logs automatically in the system being tested. These extra logs added improve further the effectiveness of
CloudRaid
without introducing any noticeable performance overhead. Our log-based approach makes it well-suited for live systems. We have applied
CloudRaid
to analyze six representative distributed systems: Hadoop2/Yarn, HBase, HDFS, Cassandra, Zookeeper, and Flink.
CloudRaid
has succeeded in testing 60 different versions of these six systems (10 versions per system) in 35 hours, uncovering 31 concurrency bugs, including nine new bugs that have never been reported before. For these nine new bugs detected, which have all been confirmed by their original developers, three are critical and have already been fixed.
更多查看译文
关键词
Distributed systems,concurrency bugs,bug detection,cloud computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要