CHAOS: Accurate and Realtime Detection of Aging-Oriented Failure Using Entropy.

CoRR(2015)

引用 24|浏览21
暂无评分
摘要
Even well-designed software systems suffer from chronic performance degradation, also named "software aging", due to internal (e.g. software bugs) and external (e.g. resource exhaustion) impairments. These chronic problems often fly under the radar of software monitoring systems before causing severe impacts (e.g. system failure). Therefore it's a challenging issue how to timely detect these problems to prevent system crash. Although a large quantity of approaches have been proposed to solve this issue, the accuracy and effectiveness of these approaches are still far from satisfactory due to the insufficiency of aging indicators adopted by them. In this paper, we present a novel entropy-based aging indicator, Multidimensional Multi-scale Entropy (MMSE). MMSE employs the complexity embedded in runtime performance metrics to indicate software aging and leverages multi-scale and multi-dimension integration to tolerate system fluctuations. Via theoretical proof and experimental evaluation, we demonstrate that MMSE satisfies Stability, Monotonicity and Integration which we conjecture that an ideal aging indicator should have. Based upon MMSE, we develop three failure detection approaches encapsulated in a proof-of-concept named CHAOS. The experimental evaluations in a Video on Demand (VoD) system and in a real-world production system, AntVision, show that CHAOS can detect the failure-prone state in an extraordinarily high accuracy and a near 0 Ahead-Time-To-Failure (ATTF). Compared to previous approaches, CHAOS improves the detection accuracy by about 5 times and reduces the ATTF even by 3 orders of magnitude. In addition, CHAOS is light-weight enough to satisfy the realtime requirement.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要