Exploring the Potential of Fast Delta Encoding: Marching to a Higher Compression Ratio

2020 IEEE International Conference on Cluster Computing (CLUSTER)(2020)

引用 5|浏览18
暂无评分
摘要
Delta compression (or called delta encoding) is a data reduction technique capable of calculating the differences (i.e., delta) among the very similar files and chunks, and is thus widely used for optimizing synchronization replication, backup/archival storage, cache compression, etc. However, delta compression is costly because of its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches, are either at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this paper, we propose Gdelta, a fast delta encoding approach with a high compression ratio, that improves the delta encoding speed by employing an improved fast Gear-based rolling hash for scanning fine-grained words, and a quick array-based indexing scheme for word-matching, and then, after word-matching, further batch compressing the rest to improve the compression ratio. Our evaluation results driven by six real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 2X~4X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%~120%.
更多
查看译文
关键词
Data reduction,compression,delta encoding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要