Towards data generation to alleviate privacy concerns for cybersecurity applications.

COMPSAC(2023)

引用 0|浏览0
暂无评分
摘要
While sharing of data is vital for learning progression and knowledge development, its full effectiveness is limited due to concerns about privacy and the presence of stringent regulations. This issue is particularly grave in the domain of cybersecurity applications where client data often comprises confidential and sensitive information. Furthermore, cybersecurity datasets tend to suffer from class imbalance, where data related to cyber attacks are rare compared to the benign conditions. Hence, performing machine learning (ML) tasks such as attack detection and classification becomes a challenging endeavor. Synthetic tabular data has emerged as a viable alternative to enable data sharing while satisfying regulatory and privacy constraints. In this paper, we present a methodology that utilizes the Intrusion Detection System (IDS) dataset to generate synthetic tabular representational data from raw dataset while addressing class imbalance issues during the data generation process. The methodology incorporates a feature selection process that identifies the most important features that help with accurate data generation, and demonstrates comparable performance using popular machine learning (ML) techniques on the anomaly detection task. The similarity between the original and generated datasets is evaluated using two metrics - distribution metric and data reduction metric - achieving up to 0.97 similarity score on the data reduction metric, outperforming a baseline approach that uses all input features by up to 11%.
更多
查看译文
关键词
Data generation, Generative Adversarial Networks, Intrusion Detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要