Two Can Play That Game

Ankit Shah,Arunesh Sinha,Rajesh Ganesan,Sushil Jajodia,Hasan Cam

ACM Transactions on Intelligent Systems and Technology（2020）

引用 2|浏览6

暂无评分

摘要

Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which forms the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations (defender or blue team). Recent work proposed a reinforcement learning (RL) based approach for the defender’s decision-making to prevent the cyber-alert queue length from growing large and overwhelming the defender. In this article, we perform a red team (adversarial) evaluation of this approach. With the recent attacks on learning-based decision-making systems, it is even more important to test the limits of the defender’s RL approach. Toward that end, we learn several adversarial alert generation policies and the best response against them for various defender’s inspection policy. Surprisingly, we find the defender’s policies to be quite robust to the best response of the attacker. In order to explain this observation, we extend the earlier defender’s RL model to a game model with adversarial RL, and show that there exist defender policies that can be robust against any adversarial policy. We also derive a competitive baseline from the game theory model and compare it to the defender’s RL approach. However, when we go further to exploit the assumptions made in the Markov Decision Process (MDP) in the defender’s RL model, we discover an attacker policy that overwhelms the defender. We use a double oracle like approach to retrain the defender with episodes from this discovered attacker policy. This made the defender robust to the discovered attacker policy and no further harmful attacker policies were discovered. Overall, the adversarial RL and double oracle approach in RL are general techniques that are applicable to other RL usage in adversarial environments.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要