Analysis of Knuth's Sampling Algorithm D and D'

Mridul Nandi, Soumit Paul

CoRR(2023)

引用 0|浏览1
暂无评分
摘要
In this research paper, we address the Distinct Elements estimation problem in the context of streaming algorithms. The problem involves estimating the number of distinct elements in a given data stream $\mathcal{A} = (a_1, a_2,\ldots, a_m)$, where $a_i \in \{1, 2, \ldots, n\}$. Over the past four decades, the Distinct Elements problem has received considerable attention, theoretically and empirically, leading to the development of space-optimal algorithms. A recent sampling-based algorithm proposed by Chakraborty et al.[11] has garnered significant interest and has even attracted the attention of renowned computer scientist Donald E. Knuth, who wrote an article on the same topic [6] and called the algorithm CVM. In this paper, we thoroughly examine the algorithms (referred to as CVM1, CVM2 in [6] and DonD, DonD' in [6]. We first unify all these algorithms and call them cutoff-based algorithms. Then we provide an approximation and biasedness analysis of these algorithms.
更多
查看译文
关键词
knuth,algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要