Scalable distance-based outlier detection over high-volume data streams
ICDE(2014)
摘要
The discovery of distance-based outliers from huge volumes of streaming data is critical for modern applications ranging from credit card fraud detection to moving object monitoring. In this work, we propose the first general framework to handle the three major classes of distance-based outliers in streaming environments, including the traditional distance-threshold based and the nearest-neighbor-based definitions. Our LEAP framework encompasses two general optimization principles applicable across all three outlier types. First, our “minimal probing” principle uses a lightweight probing operation to gather minimal yet sufficient evidence for outlier detection. This principle overturns the state-of-the-art methodology that requires routinely conducting expensive complete neighborhood searches to identify outliers. Second, our “lifespan-aware prioritization” principle leverages the temporal relationships among stream data points to prioritize the processing order among them during the probing process. Guided by these two principles, we design an outlier detection strategy which is proven to be optimal in CPU costs needed to determine the outlier status of any data point during its entire life. Our comprehensive experimental studies, using both synthetic as well as real streaming data, demonstrate that our methods are 3 orders of magnitude faster than state-of-the-art methods for a rich diversity of scenarios tested yet scale to high dimensional streaming data.
更多查看译文
关键词
stream data points,scalable distance-based outlier detection,pattern recognition,high-volume data streams,lifespan-aware prioritization principle,cpu costs,minimal probing principle,probing process,streaming environments,distance-threshold,general framework,high dimensional streaming data,data handling,distance-based outliers,moving object monitoring,general optimization principles,leap framework,nearest-neighbor-based definitions,credit card fraud detection,neighborhood searches,lightweight probing operation,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络