Semi-supervised anomaly detection on a Tier-0 HPC system

ACM International Conference on Computing Frontiers (CF)(2022)

引用 0|浏览7
暂无评分
摘要
Automated and data-driven methodologies are being introduced to assist system administrators in managing increasingly complex modern HPC systems. Anomaly detection (AD) is an integral part of improving the overall availability as it eases the system administrators' burden and reduces the time between an anomaly and its resolution. This work improves upon the current state-of-the-art (SoA) AD model by considering temporal dependencies in the data and including long-short term memory cells in the architecture of the AD model. The proposed model is evaluated on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA consisting of 985 nodes). The proposed model achieves an area under the curve (AUC) of 0.758, improving upon the state-of-the-art approach that achieves an AUC of 0.747.
更多
查看译文
关键词
Machine Learning, Anomaly Detection, High Performance Computing, Semi-supervised Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要