A Middle-Ware Approach To Leverage The Distributed Data Deduplication Capability On Hpc And Cloud Storage Systems

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)（2020）

引用 1|浏览9

暂无评分

摘要

The unprecedented growth in the volume and diversity of the data in today's HPC and Enterprise computing environment has posted challenging problems on data management and data space reduction. More than 71% of enterprise and HPC communities are seeking de-duplication technologies to reduce cost and increase the storage efficiency. The importance of applying data de-duplication techniques is critical for active research and development. Current implementations of data de-duplication systems are mainly hardware dependent, system dependent, and platform dependent. Also, most of these implementations are proprietary software and not in open source domains. In this paper, we present a new middle-ware design and implementation approach, named D3M, to support distributed data de-duplication feature on existing file and object storage systems. We also incorporate this proposed D3M middle-ware with the Redhat's Linux device layer de-duplication and compression driver, called VDO (Virtual Data Optimizer). With these two layers of data de-duplication support, we accommodate both client side and server-side data de-duplication features. Finally, we conduct various testing cases on HPC data sets and Enterprise data sets to illustrate the benefits and advantages of applying our bilayer data de-duplication middle-ware solution.

查看译文

关键词

De-duplication, Storage, HPC, Middle-ware, Data Chunking, Performance evaluation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要