Sliding Window Sampling over Data Stream – a Solution Based on Devil’s Staircases

2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)(2023)

Cited 0|Views8
No score
Abstract
The paper concerns sampling from a data stream {$S_{i}$}: at a moment t the sampler should hold a value $S_{t-j}$, where j$\in${0,$\ldots$,n-1} should be chosen according to an a priori specified probability distribution D on {0,$\ldots$,n-1}, where D as well as the window size n are fixed and do not depend on t. We assume that the sampler has a constant size memory, while n might be large, so the sampler cannot remember the last n values of the stream except for a few. The problem is that the window of the last n elements changes at each step and when we have to resample, then almost all values from which we have to choose are already forgotten. The case of uniform distribution D has been considered by Braverman, Ostrovsky, and Zaniolo in 2013. We present an alternative generic approach based on specific Markov chains called devil’s staircases. Unlike the previous solution, it is not limited to the uniform distribution: it generates a sample according to any admissible distribution in the window of size n and uses memory of size $\mathrm{O}(1)$. We provide sufficient conditions for the distribution D to be admissible. Although the class of such distributions is quite wide from the point of view of practical applications, we show some natural limitations for this class.
More
Translated text
Key words
data stream,sliding window,random sampling,Markov chain,devil’s staircase
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined