Practical experiences with chronics discovery in large telecommunications systems

SOSP(2012)

引用 12|浏览0
暂无评分
摘要
AbstractChronics are recurrent problems that fly under the radar of operations teams because they do not perturb the system enough to set off alarms or violate service-level objectives. The discovery and diagnosis of never-before seen chronics poses new challenges as they are not detected by traditional threshold-based techniques, and many chronics can be present in a system at once, all starting and ending at different times. In this paper, we describe our experiences diagnosing chronics using server logs on a large telecommunications service. Our technique uses a scalable Bayesian distribution learner coupled with an information-theoretic measure of distance (KL divergence), to identify the attributes that best distinguish failed calls from successful calls. Our preliminary results demonstrate the usefulness of our technique by providing examples of actual instances where we helped operators discover and diagnose chronics.
更多
查看译文
关键词
information theoretic measure,practical experience,actual instance,new challenge,kl divergence,diagnose chronics,operations team,preliminary result,chronics discovery,large telecommunications service,traditional threshold-based technique,different time,information-theoretic measure,large telecommunications system,service level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要