Optimizing All-to-All Collective Communication on Tianhe Supercomputer.

ISPA/BDCloud/SocialCom/SustainCom(2022)

引用 0|浏览10
暂无评分
摘要
All-to-all communication has a wide range of ap-plications in parallel applications like FFT. On most supercom-puters, there are multiple cores in a node. Message aggregation is an efficient method for smaller messages. Using multi-leader to aggregate message show significant improvement in intra-node overhead. However, compared to one-leader aggregation, existing multi-leader design incurs more message count and less aggregated message size. This paper proposes an Overlapped Multi-worker Multi-port all-to-all (OVALL) to scale the message size and parallelism of the aggregation algorithm. The algorithm exploits the all-to-all multi-core parallelism, concurrent commu-nication, and overlapping capabilities. Experiment results show that, compared to systems built-in MPI, OVALL'implementation achieves up to 5.9x or 18x speedup compared to MPI on different HPC systems. For the Fast Fourier Transform (FFT) application, OVALL is up to 2.7x (8192 cores, system A) or 5.6x (4800 cores, system B) faster compared to built-in MPI on peak performance.
更多
查看译文
关键词
MPI,All to all,Message Aggratation,Multi core Concurrency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要