A Study on VLG Recovery Scheme for Distributed Systems
semanticscholar(2016)
Abstract
Long running multiprocessor scientific applications are often subject to failures. A failure means loss of computation, sometimes significant. But all failures cannot be treated alike. Failures can be classified in many ways, for example, they can be classified into failures affecting one processor, multiple processors and system as a whole. Alternatively, they can be classified as transient and permanent failures. Another way to classify failures is as lessprobable and high-probable failures. Irrespective of classification method, failures can be characterised by two aspects: impact intensity and occurrence frequency. This paper introduces a three level coordinated checkpointing scheme, we call it VLG checkpointing scheme for multiprocessor and distributed systems. This three level VLG recovery scheme tolerates failures with less average performance overhead by considering above two aspects of failures.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined