Improving Resilience Of Software Systems: A Case Study In 3d-Online Game System

INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING(2017)

Cited 4|Views12
No score
Abstract
Resilience is the property that enables a system to continue operating properly when one or more faults occur. Nowadays, as software systems become more and more complex, their hardware execution platforms also become more heterogenous with larger scale. Software systems may fail due to some faults such as node breakdown, communication failure, or data processing failure. In this paper, we propose a ring-based resilience mechanism, which implements fault detection and recovery. ( 1) To solve the problem that the central server may have high burden of network traffic, we design a ring-based heartbeat algorithm for crash fault detection. ( 2) We also design a light-weight recovery mechanism to recover from crash faults as compared with the current system-specific mechanisms. To evaluate our mechanism, we use a 3D-online game system as a case study. By injecting faults, we test the effectiveness and overhead of the proposed mechanism. Compared with other mechanisms, the experimental results show that our mechanism can support resilience very well and is better at dealing with the crash fault caused by high cluster workload with acceptable overhead.
More
Translated text
Key words
Heartbeat, ring-based, light-weight recovery, resilience
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined