Reinforcement learning based adversarial malware example generation against black-box detectors

Computers & Security(2022)

引用 2|浏览28
暂无评分
摘要
Recent advances in machine learning offer attractive tools for sophisticated adversaries. An attacker could transform malware into its adversarial version but retain its malicious functionality by employing a dedicated perturbation method. These adversarial malware examples have demonstrated the effectiveness to bypass antivirus engines. However, recent works only leverage a single perturbation method to generate adversarial examples, which cannot defeat advanced detectors. In this paper, we propose a reinforcement learning-based framework called MalInfo, which could generate powerful adversarial malware examples to evade the third-party detectors via an adaptive selection of a perturbation path for each malware in our collected dataset with 1000 diverse malware. To cope with limited computation, MalInfo applies either dynamic programming or temporal difference learning to choose the optimal perturbation path where each path is formed by the combination of Obfusmal, Stealmal, and Hollowmal. We provide a proof-of-concept implementation and extensive evaluation of our framework. Both the detection rate and evasive rate have substantially been improved compared with the state-of-art research MalFox Zhong et al. (2021). To be specific, The average detection rates for dynamic programming and temporal difference learning are 23.2% (21.9% lower than MalFox) and 27.5% (7.4% lower than MalFox), respectively, and the average evasive rates are 65.8% (17.1% higher than MalFox) and 59.4% (5.7% higher than MalFox), respectively.
更多
查看译文
关键词
Adversarial malware examples,Dynamic programming,Malware,Reinforcement learning,Temporal difference learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要