Cluster-Reduce: Compressing Sketches for Distributed Data Streams

Knowledge Discovery and Data Mining(2021)

Cited 8|Views53
No score
Abstract
ABSTRACTSketches, a type of probabilistic algorithms, have been widely accepted as the approximate summary of data streams. Compressing sketches is the best choice in distributed data streams to reduce communication overhead. The ideal compression algorithm should meet the following three requirements: high efficiency of compression procedure, support of direct query without decompression, and high accuracy of compressed sketches. However, no prior work can meet these requirements at the same time. Especially, the accuracy is poor after compression using existing methods. In this paper, we propose Cluster-Reduce, a framework for compressing sketches, which can meet all three requirements. Our key technique nearness clustering rearranges the adjacent counters with similar values in the sketch to significantly improve the accuracy. We use Cluster-Reduce to compress four kinds of sketches in two use-cases: distributed data streams and distributed machine learning. Extensive experimental results show that Cluster-Reduce can achieve up to 60 times smaller error than prior works. The source codes of Cluster-Reduce are available at Github anonymously[1].
More
Translated text
Key words
Sketch Compression, Distributed Data Streams, Distributed ML
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined