An Application-Level Synchronous Checkpoint-Recover Method for Parallel CFD Simulation

Computational Science and Engineering(2013)

引用 1|浏览0
暂无评分
摘要
High Performance Computing (HPC) is increasingly being used in Computational Fluid Dynamics (CFD) simulation for acceleration. However, CFD simulation faces serious reliability problems, and fault tolerant technology must be taken to ensure the efficient execution of the large-scale parallel CFD simulation. In this paper, we propose an application-level synchronous checkpoint-recover method for parallel CFD simulation on the basis of the application features of CFD simulation. In this method, the periodic snapshot output in the CFD simulation is naturally treated as a blocking coordinated checkpoint, and all the processes can resume the execution from the latest checkpoint with an arbitrary number of fail processes. We design the synchronous checkpoint-recovery framework for CFD simulation, and implement it in the open source software Open FOAM. Experimental results demonstrate that our method can well support the fault tolerant in large-scale parallel CFD applications with very little additional overhead on the original cost of CFD periodic snapshot output.
更多
查看译文
关键词
public domain software,parallel processing,large-scale parallel cfd application,large-scale parallel cfd simulation execution,checkpointing,open source software,blocking coordinated checkpoint,open foam,large-scale parallel cfd simulation,computational fluid dynamics,latest checkpoint,efficient execution,fault tolerant computing,reliability problem,parallel cfd simulation,fault tolerant,fault tolerant technology,cfd simulation,high performance computing,application-level synchronous checkpoint-recover method,cfd,hpc,checkpoint-recover,process execution,cfd periodic snapshot output,fail process
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要