JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation.

Proc. ACM Manag. Data(2023)

引用 0|浏览26
暂无评分
摘要
Inner-product estimation is the base of many important tasks in a variety of big data scenarios, including measuring similarity of streams in data stream processing, estimating join size in database, and analyzing cosine similarity in various applications. Sketch, as a class of probability algorithms, is promising in inner-product estimation. However, existing sketch solutions suffer from low accuracy due to their neglect of the high skewness of real data. In this paper, we design a new sketch algorithm for accurate and unbiased inner-product estimation, namely JoinSketch. To improve accuracy, JoinSketch consists of multiple components, and records items with different frequency in different components. We theoretically prove that JoinSketch is unbiased, and has lower variance compared with the well-known AGMS and Fast-AGMS sketch. The experimental results show that JoinSketch improves the accuracy by 10 times in average while maintaining a comparable speed. All code is open-sourced at Github.
更多
查看译文
关键词
joinsketch algorithm,estimation,inner-product
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要