A distributed counter-based non-blocking coordinated checkpoint algorithm for grid computing applications

Advances in Computational Tools for Engineering Applications(2012)

Cited 1|Views3
No score
Abstract
In distributed systems, there are many opportunities for failure. Any component in any compute node could fail. This includes, but is not limited to, the processor, disk, memory, or network interface on the node. Any of these failures will cause the processes running on the affected nodes to crash or produce incorrect results. The common method of ensuring the progress of these processes is to take a checkpoint, this issue is complicated if the processes are inter-communication processes. This paper presents a distributed non-blocking coordinated checkpointing algorithm that ensures producing global consistent checkpoints images. These consistent checkpoint images can be used to migrate application processes to different computing nodes when a failure takes place.
More
Translated text
Key words
checkpointing,distributed algorithms,grid computing,application process migration,crash,distributed counter-based nonblocking coordinated checkpoint algorithm,distributed systems,failures,global consistent checkpoint images,inter-communication processes,coordinated checkpointing,consistent state,fault-tolerance
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined