Con2Mix: A semi-supervised method for imbalanced tabular security data

Xiaodi Li,Latifur Khan, Mahmoud Zamani, Shamila Wickramasuriya,Kevin Hamlenb,Bhavani Thuraisingham

JOURNAL OF COMPUTER SECURITY(2023)

引用 0|浏览5
暂无评分
摘要
Con2Mix (Contrastive Double Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach for finding code vulnerabilities in imbalanced, tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying Con2Mix to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance on two evaluated data sets compared with other methods.
更多
查看译文
关键词
Semi-supervised learning,contrastive learning,tabular data sets,security data sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要