Online Failure Prediction Through Fault Injection and Machine Learning: Methodology and Case Study

2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)(2023)

Cited 0|Views11
No score
Abstract
Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The characteristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations.
More
Translated text
Key words
Dependability,Failure Prediction,Fault Injection,Machine Learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined