Bayesian Sketches for Volume Estimation in Data Streams.

Francesco Da Dalt,Simon Scherrer,Adrian Perrig

Proc. VLDB Endow.(2022)

引用 1|浏览23
暂无评分
摘要
Given large data streams of items, each attributable to a certain key and possessing a certain volume, the aggregate volume associated with a key is difficult to estimate in a way that is both efficient and accurate. On the one hand, exact counting with dedicated counters incurs unacceptable overhead during stream processing. On the other hand, sketch algorithms, i.e., approximate-counting techniques that share counters among keys, have suffered from a trade-off between accuracy and query efficiency: Classic sketch algorithms allow to compute rough estimates in an efficient way, whereas more recent proposals yield highly accurate estimates at the cost of greatly increased computation time. In this work, we propose three sketch algorithms that overcome this trade-off, computing highly accurate estimates with lightweight procedures. To reconcile these desiderata, we employ novel estimation methods that rely on Bayesian probability theory, counter-cardinality information, and basic machine-learning techniques. The combination of these techniques enables highly accurate estimates, which we demonstrate by both a theoretical worst-case analysis and an experimental evaluation. Concretely, our sketches allow to efficiently produce volume estimates with an average relative error of < 4%, which previous methods could only achieve with computations that are several orders of magnitude more expensive.
更多
查看译文
关键词
volume estimation,bayesian sketches,data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要