Log Analytics In Hpc: A Data-Driven Reinforcement Learning Framework

Zhengping Luo,Tao Hou,Tung Thanh Nguyen,Hui Zeng,Zhuo Lu

IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS)（2020）

引用 5|浏览14

暂无评分

摘要

High Performance Computing (HPC) has been employed in many fields such as aerospace, weather forecast, numerical simulation, scientific research etc. Security of HPC, especially anomaly/intrusion detection, has attracted many attentions in recent years. Given the heavily instrumented property of HPC systems, logs become an effective and direct data source that can be utilized to evaluate the system status, further, to detect anomalies or malicious users. In this paper, we offer a novel perspective, treating the anomaly detection in HPC as a sequential decision process, and further applying reinforcement learning techniques to learn the state transition process, based on which we build a framework named as ReLog to detect anomalies or malicious users. Besides, a common challenge of employing machine learning techniques is lacking sufficient data, we provide a generative adversarial network (GAN)-based solution to generate sufficient training data in HPC. The experimental validations are conducted based on real-world collected MPI logs, and our results demonstrate a 93% of detection accuracy on the collected dataset.

查看译文

关键词

High performance computing, security, reinforcement learning, defenses and attacks, log analytics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要