P-Sketch: A Fast and Accurate Sketch for Persistent Item Lookup

IEEE-ACM TRANSACTIONS ON NETWORKING(2024)

引用 0|浏览1
暂无评分
摘要
In large data streams consisting of sequences of data items, those appearing over a long period of time are regarded as persistent. Compared with frequent items, persistent items do not necessarily hold large amounts of data and thus may hamper the effectiveness of vanilla volume-based detectors. Identifying persistent items plays a crucial role in a range of areas such as fraud detection and network management. Fast detection of persistent items in massive streams is however challenging due to the inherently high data rates, while state-of-the-art persistent item lookup solutions routinely require large enough memory to attain high accuracy, which questions the feasibility of deploying them in practice. In this paper, we introduce P-Sketch, a novel approach to persistent item lookup that achieves high accuracy even with small memory (L1 Cache) budgets and maintains high update speed across different settings. Specifically, we introduce the concept of arrival continuity (hotness) that counts the number of consecutive windows in which an item appears, to effectively protect persistent items from being wrongly replaced by non-persistent ones. Through meticulous data analysis, we also reveal that items with higher persistence tend to possess a stronger hotness than non-persistent ones. Thus, we harness the information of persistence and hotness, and employ a probability-based replacement strategy to achieve a good balance between memory efficiency, lookup accuracy, and update speed. We also present a theoretical analysis of the performance of the proposed P-Sketch. Through trace-driven emulations, we demonstrate that our P-Sketch yields average F1 score and update throughput gains of up to 10.32x and respectively 2.9x, over existing schemes. Lastly, we show how to further boost the P-Sketch's update speed with Single Instruction Multiple Data (SIMD) instructions.
更多
查看译文
关键词
Data stream mining,persistent items,sketch
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要