An Information-Theoretic Analysis of Deduplication.

international symposium on information theory(2019)

引用 18|浏览49
暂无评分
摘要
Deduplication finds and removes long-range data duplicates. It is commonly used in cloud and enterprise server settings and has been successfully applied to primary, backup, and archival storage. Despite its practical importance as a source-coding technique, its analysis from the point of view of information theory is missing. This paper provides such an information-theoretic analysis of data deduplication. It introduces a new source model adapted to the deduplication setting. It formalizes both fixed and variable-length deduplication schemes, and it introduces a novel, multi-chunk deduplication scheme. It then provides an analysis of these three deduplication variants, emphasizing the importance of boundary synchronization between source blocks and deduplication chunks. The proposed multi-chunk deduplication scheme is shown to be order optimal under fairly mild assumptions.
更多
查看译文
关键词
Servers,Synchronization,Redundancy,Standards,Data models,Virtual machining,Information theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要