Locating Anomaly Clues for Atypical Anomalous Services: An Industrial Exploration

IEEE Transactions on Dependable and Secure Computing(2023)

引用 1|浏览11
暂无评分
摘要
Continuity and steadiness are vital for services with massive users, which requires the anomalies of services should be detected and resolved in a timely manner. Our previous work proposed a tool, namely ImpAPTr (Impact Analysis based on Pruning Tree) , to identify the combination of multiple dimensional attributes as the clues leading to the root cause of service anomalies. However, ImpAPTr applies a threshold driven strategy, i.e., it needs to be triggered by a $\geq 0.05\%$ drop of the success rate of the service calls (abbr. SRSC ), which may face problems in an atypical yet pervasive situation in field application. For example, the combination of trivial anomalies (i.e., each causes a drop less than 0.05% to SRSC ) can lead to a far more than 0.05% drop on SRSC . Besides, a suitable threshold is usually hard to be determined, etc. To address these problems, we propose a new method, namely ImpAPTr+ in this paper to free the constraint of the 0.05% threshold. The basic idea is to involve time dimension and identify clues across multiple time intervals of data. We performed evaluation on three typical methods (i.e., ImpAPTr+ , R-Adtributor and Squeeze ) with both production environment dataset and simulation dataset. The former dataset is directly retrieved from the service monitoring data in Meituan , one of the largest on-line service providers worldwide. The latter dataset is fabricated also using the monitoring data from the same company. The results indicate: (1) ImpAPTr+ outperforms previous approaches to a large degree in terms of accuracy. (2) Both ImpAPTr+ and R-Adtributor are able to find proper clues within seconds. (3) ImpAPTr+ tends to find proper clues with shorter time intervals (i.e., less data), which implies that the method is more suitable for near real-time monitoring scenarios.
更多
查看译文
关键词
Anomaly clues locating, multiple dimensional attributes, on-line service monitoring
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要