Faster and Scalable MPI Applications Launching

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS(2024)

引用 0|浏览15
暂无评分
摘要
Distributed parallel MPI applications are the dominant workload in many high-performance computing systems. While optimizing MPI application execution is a well-studied field, little work has considered optimizing the initial MPI application launching phase, which incurs extensive cross-machine communications and synchronization. The overhead of MPI application launching can be expensive, accounting for more than million core hours per 10K nodes annually on the production Tianhe-2A supercomputer, which will increase as the number of parallel machines used grows. Therefore, it is critical to optimize the MPI application launching process. This paper presents a novel approach to optimizing the MPI application launch. Our approach adopts a location-aware address generation rule to eliminate the need for address exchange and a topology-aware global communication scheme to optimize cross-machine synchronization. We then design a new application launch procedure to support the proposed optimizations to further reduce the pressure of the shared I/O system. Our techniques have been deployed to production in the Tianhe-2A supercomputer and the Next Generation Tianhe Supercomputer. Experimental results show that our approach scales well and outperforms alternative schemes, reducing the MPI application launching time by over 29% with 320K MPI processes.
更多
查看译文
关键词
Peer-to-peer computing,Hardware,Libraries,Multiprocessor interconnection,Production,Optimization,Full stack,Message passing interface (MPI),high performance computing (HPC),MPI application optimizaiton
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要