1. Analysis of Knuth's Sampling Algorithm D and D'
- Author
-
Nandi, Mridul and Paul, Soumit
- Subjects
Computer Science - Data Structures and Algorithms ,F.2.0 - Abstract
In this research paper, we address the Distinct Elements estimation problem in the context of streaming algorithms. The problem involves estimating the number of distinct elements in a given data stream $\mathcal{A} = (a_1, a_2,\ldots, a_m)$, where $a_i \in \{1, 2, \ldots, n\}$. Over the past four decades, the Distinct Elements problem has received considerable attention, theoretically and empirically, leading to the development of space-optimal algorithms. A recent sampling-based algorithm proposed by Chakraborty et al.[11] has garnered significant interest and has even attracted the attention of renowned computer scientist Donald E. Knuth, who wrote an article on the same topic [6] and called the algorithm CVM. In this paper, we thoroughly examine the algorithms (referred to as CVM1, CVM2 in [11] and DonD, DonD' in [6]. We first unify all these algorithms and call them cutoff-based algorithms. Then we provide an approximation and biasedness analysis of these algorithms., Comment: We have provided an unbiased analysis (using exactly the same idea as the previous version) for the continuous score distribution instead of the discrete version
- Published
- 2023