谷歌浏览器插件
订阅小程序
在清言上使用

Design and Implementation of Burst Buffer Over-Subscription Scheme for HPC Storage Systems

IEEE ACCESS(2023)

引用 0|浏览29
暂无评分
摘要
Burst Buffer is widely used in supercomputer centers to bridge the performance gap between computational power and the high-performance I/O systems. The primary role of Burst Buffer is to temporarily absorb the bursty I/O and reduce the heavy access on Parallel File System (PFS). However, the job resource manager on High-Performance Computer (HPC) systems prefers to use a dedicated Burst Buffer allocation approach, which eventually leads to the severely underutilized Burst Buffer resource. To improve the efficiency of using the expensive Burst Buffer resource, we analyze the I/O patterns on Burst Buffer in depth. We propose Burst Buffer over-subscription allocation method, which improves Burst Buffer utilization by allowing each job to access Burst Buffer only during its I/O phases so that the jobs can overlap each other. Furthermore, we develop a new I/O congestion-aware scheduler and a transparent data management system between Burst Buffer and PFS. Our approach also reduces the memory overhead and improves the data persistence of the data management system by adapting the persistent memory. With the proposed approach, not only the Burst Buffer utilization can be improved, but also HPC applications can achieve high I/O performance by exploiting the powerful Burst Buffer hardware capabilities. Experimental results show that BBOS can improve Burst Buffer utilization by up to 120% while more stable and higher checkpoint performance is guaranteed even under high I/O loads compared to other state-of-the-art schedulers. Besides, our approach can improve the hit ratio of restart requests by up to 96.4% and provides up to 210% higher restart throughput on Burst Buffer.
更多
查看译文
关键词
Resource management,Supercomputers,Memory management,Hardware,File systems,Engines,Writing,Storage automation,Publishing,Burst buffer,checkpoint,demotion,over-subscription,parallel file system,restart
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要