Tiered data management system: Accelerating data processing on HPC systems

Future Generation Computer Systems(2019)

引用 1|浏览77
暂无评分
摘要
The explosion of scientific data generated from large-scale simulations and advanced sensors makes scientific workflows more complex and more data-intensive. Supporting these data-intensive workflows on high-performance computing systems presents new challenges in data management due to their scales, coordination behaviours, and overall complexities. In this paper, we propose Tiered Data Management System (TDMS) to accelerate scientific workflows on HPC systems. TDMS prevent repetitive data movement by providing efficient data sharing on top of tiered storage architecture. The customized data management for common workflow access patterns allows users to make full use of the advantages of different storage tiers. The extended application interface, which supports user-defined data management strategies, strengthens its ability to handle diverse storage architectures and application scenarios. Moreover, we propose a data-aware task scheduling module to launch tasks on compute nodes where the data locality of required data can be leveraged maximally. We build a prototype and deploy it on a typical HPC system. We evaluate the performance of TDMS with realistic workflows and the experiments show that the TDMS can optimize the I/O performance and provide up to 1.54x speedup for data-intensive workflows compared with Lustre file system.
更多
查看译文
关键词
HPC,Big data,Scientific workflows,Data management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要