Self-adaptive anomaly detection with deep reinforcement learning and topology

Qihong Shao,Carlos M. Pignataro

semanticscholar(2021)

引用 0|浏览4
暂无评分
摘要
In the networking field, network topology is one of the most important perspectives as it can bring additional insights to the modeling process. Existing anomaly detection approaches do not take topology information into consideration. To address such limitations techniques are presented herein that support deep Convolutional Neural Network (CNN) modeling with Reinforcement Learning (RL) employing, for example, an Advantage Actor Critic (A2C) algorithm. Additionally, aspects of the techniques presented herein support an innovative new way to model a "customer profile" that leverages topology information. DETAILED DESCRIPTION Anomaly detection is an important part of any system, from an online system to a network. It can be a very challenging task due to, for example, the lack of abnormal data points and insufficient knowledge. Even with large amount of data, it is still difficult to identify and label an "abnormal" data point. In traditional machine learning (ML) and deep learning (DL) models for anomaly detection it is difficult to reach the desirable level of accuracy and frequent false alarms may arise. Such models are limited by a range of conditions, including:  They are typically based on a strong assumption about the underlying mechanism of anomaly patterns. Both ML and DL methods assume it has either a distribution or latent space and try to find the underlying models. It may not produce satisfactory results under scenarios where the assumptions do not hold.  They are typically tedious for threshold setting. In some instances, hundreds or thousands of parameters may need to be tuned, and in practice it is not feasible for a customer to properly setup and customize their solutions. 2 Shao and Pignataro: SELF-ADAPTIVE ANOMALY DETECTION WITH DEEP REINFORCEMENT LEARNING Published by Technical Disclosure Commons, 2021 2 6584  Significant human effort is typically needed to customize such models and they can be difficult to scale up and access. Such systems can be very complex and engineers may not only need understand all of the components but also comprehend the set of methods to be able to tune the parameters for each of component.  They typically lack self-adaptive learning. Anomaly detection approaches can accumulate more and more data, which should dynamically improve the performance. However, existing anomaly detectors are typically assumed to be static and often need human involvement to make changes.  They are typically tailored to network specific features. A network has, among other things, a topology which provides useful and additional information for anomaly detection models. All of this useful information is not properly leveraged in existing approaches. Reinforcement learning (RL) is emerging as a hot topic with better computational resources becoming available. It does not require human involvement and it can learn an optimal policy by interaction within an environment. It is a natural fit for resolving the challenges in anomaly detection, including self-adapted learning. The objective of RL is to make decisions, from external behavior data, for an agent based on the underlying reward function. As shown in Figure 1, below, a reward function provides an agent incentives to improve its strategy hence thus receiving as many rewards as possible. The agent's normal behavior is then understood by the reward function which is inferred via RL. Using a learned reward function, one may evaluate whether a new observation from the target agent follows a normal pattern. In other words, if the new observation generates a low reward then it implies that the observation is not explained by the preferences of the agent that have been learned thus far and that observation may be considered as a potential anomaly. 3 Defensive Publications Series, Art. 3946 [2021] https://www.tdcommons.org/dpubs_series/3946 3 6584 Figure 1: Reinforcement Learning (RL) A network has a very unique feature, topology information. Existing network anomaly detection approaches typically focus on a single device and single feature detection, without considering a network’s topology. However, network topology is very important for a number of reasons. As will be described below, aspects of the techniques presented herein have broad applicability since the RL is applied to the abstraction of a topology. First, a topology is a layer-network construct, which may apply to:  Network topology, which in the most pragmatic sense is a link-state database. This is, for example, an Open Shortest Path First (OSPF) link-state database (LSDB) topology or an Intermediate System Intermediate System (IS-IS) topology. As well, for a general case any topology such as, for example, a network analysis toolkit, collectors, etc. are also relevant.  Service topology, which is usually a graph of an ordered application of services. The most direct application is a service graph, as described in, for example, Internet Engineering Task Force (IETF) Request for Comments (RFC) 7665 (see, https://tools.ietf.org/html/rfc7665#section-2.1) and RFC 8300 (see, https://tools.ietf.org/html/rfc8300#section-6.4). It is important to note that aspects of the techniques presented herein may be well applied to a cross-layer topology, mapping and correlating between a network topology and a service topology. 4 Shao and Pignataro: SELF-ADAPTIVE ANOMALY DETECTION WITH DEEP REINFORCEMENT LEARNING Published by Technical Disclosure Commons, 2021 4 6584 In summary, in the network field, topology is one of the most important perspectives, which can, among other things, bring additional insights to the modeling process. Existing approaches do not take topology information into consideration. In light of the this, techniques are presented herein that support the application of deep CNN modeling and RL (with, for example, an A2C algorithm) to the challenges of anomaly detection. Aspects of the presented techniques support a new way to model a "customer profile" by leveraging topology information. In support of the discussion that will be presented below it will be helpful to briefly describe the problem setting under which RL may be applied through the techniques that are presented herein. Aspects of the techniques presented herein support a CNN-based anomaly detector that is trained consistently through RL to provide a self-adapted anomaly detection system. Following the framework of RL, the environment is modeled as follows:  S is the finite set of states.  A is the finite set of actions. A = {0, 1} in which 1 means the given state is anomalous and 0 otherwise.  P(s,a,s') is a dynamic/transition model for each action, the state transition probability of changing to state s' from state s when action a is taken, according to the following formula:  R (s,a) is reward of executing action "a" in state "s".  v[0, 1) is a discount factor, which weighs immediate and future rewards.  Policy determines how the agent chooses actions : S -> A, mapping from states to actions. A state value function determines the expected sum of future rewards under a particular policy which specifies what is good in the long run: 5 Defensive Publications Series, Art. 3946 [2021] https://www.tdcommons.org/dpubs_series/3946 5 6584 For the problem setting under aspects of the techniques presented herein "Anomaly Detector" is defined as the policy. The optimal anomaly detector is the detector that satisfies the following constraint: The Experience E is a set of tuples that are defined as . The variables "s" and " s' " in S indicate the states of the target system before and after the action a. In an anomaly detector, actions are selected by the anomaly detector. Therefore, the experience records all of the behaviors of the anomaly detector. Considering a deterministic optimal anomaly detector, it should maximize the performance and, in fact, is fully determined by the accumulated reward function Q(s, a). Q(s, a) represents the accumulated reward started from state "s" with action "a," which is the average accumulated reward in anomaly detection following the anomaly detector . The goal of an anomaly detector is to consistently improve the policy by learning from the experience to gain a better estimation of Q(s, a). This can be achieved by learning from the state and action history. Figure 2, below, shows the learning process of anomaly detection, s in blue represents the "state" and "a" in orange represents the "action." Starting state "s_0" takes an action "a_0" and it is possible to reach a next state, e.g. {s_11, s_12, s_13....}. It is a stochastic process and decides which action to take. Continuously, once state s_1x is reached it takes some other actions continuously to improve the policy. There are many different learning options including, for example, dynamic programming, Monte Carlo methods, temporal difference (TD), etc. The best option may be selected based on, for example, different customer data. 6 Shao and Pignataro: SELF-ADAPTIVE ANOMALY DETECTION WITH DEEP REINFORCEMENT LEARNING Published by Technical Disclosure Commons, 2021 6 6584 Figure 2: Learning Optional Policy from “s” (state) and “a” (action) Further in support of the discussion that will be presented below it will be helpful to briefly discuss CNN with RL. As shown in Figure 3, below, customer device data and their topology information may be converted into "state frames." Each time series may be transformed into a set of multi-dimensional data instances using the sliding window method. The "state" image that is generated from topology information would be a high dimension data, as there are hundreds or thousands of devices in a customer network and each device may have hundreds features. Hence, a CNN may be leveraged to reduce the dimensionality and extract the lower level and high level features out of the original "state" image in addition to finding the pattern and correlations. Under aspects of the techniques pr
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要