Coverage Guided Fault Injection for Cloud Systems.

ICSE(2023)

Cited 0|Views24
No score
Abstract
To support high reliability and availability, modern cloud systems are designed to be resilient to node crashes and reboots. That is, a cloud system should gracefully recover from node crashes/reboots and continue to function. However, node crashes/reboots that occur under special timing can trigger crash recovery bugs that lie in incorrect crash recovery protocols and their implementations. To ensure that a cloud system is free from crash recovery bugs, some fault injection approaches have been proposed to test whether a cloud system can correctly recover from various crash scenarios. These approaches are not effective in exploring the huge crash scenario space without developers' knowledge. In this paper, we propose CrashFuzz, a fault injection testing approach that can effectively test crash recovery behaviors and reveal crash recovery bugs in cloud systems. CrashFuzz mutates the combinations of possible node crashes and reboots according to runtime feedbacks, and prioritizes the combinations that are prone to increase code coverage and trigger crash recovery bugs for smart exploration. We have implemented CrashFuzz and evaluated it on three popular open-source cloud systems, i.e., ZooKeeper, HDFS and HBase. CrashFuzz has detected 4 unknown bugs and 1 known bug. Compared with other fault injection approaches, CrashFuzz can detect more crash recovery bugs and achieve higher code coverage.
More
Translated text
Key words
cloud system,crash recovery bug,fault injection,bug detection,fuzzing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined