Tight-Sketch: A High-Performance Sketch for Heavy Item-Oriented Data Stream Mining with Limited Memory Size

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023(2023)

引用 0|浏览0
暂无评分
摘要
Accurate and fast data stream mining is critical and fundamental to many tasks, including time series database handling, big data management and machine learning. Different heavy-based detection tasks, such as heavy hitter, heavy changer, persistent item and significant item detection, have drawn much attention from both the industry and academia. Unfortunately, due to the growing data stream speeds and limited memory (L1 cache) available for real-time processing, existing schemes face challenges in simultaneously achieving high detection accuracy, high memory efficiency, and fast update throughput, as we reveal. To tackle this conundrum, we propose a versatile and elegant sketch framework named Tight-Sketch, which supports a spectrum of heavy-based detection tasks. Considering that most items are cold (non-heavy/persistent/significant) in practice, we employ different eviction treatments for different types of items to discard these potentially cold ones as soon as possible, and offer more protection to those that are hot (heavy/persistent/significant). In addition, we propose an eviction method that follows a stochastic decay strategy, enabling Tight-Sketch to only bear small one-sided errors (no overestimation). We present a theoretical analysis of the error bounds and conduct extensive experiments on diverse detection tasks to demonstrate that Tight-Sketch significantly outperforms existing methods in terms of accuracy and update speed. Lastly, we accelerate Tight-Sketch's update throughput by up to 36% with Single Instruction Multiple Data (SIMD) instructions.
更多
查看译文
关键词
data stream mining,heavy item,persistent item,significant item,sustained arrival strength
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要