Framework for scalable intra-node collective operations using shared memory.

SC(2018)

引用 37|浏览91
暂无评分
摘要
Collective operations are used in MPI programs to express common communication patterns, collective computations, or synchronization. In many collectives, such as MPI_Allreduce, the intra -node component of the collective lies on the critical path, as the inter -node communication cannot start until the intra -node component has completed. With increasing number of core counts in each node, intra -node optimizations that leverage shared memory become more important. In this paper, we focus on the performance benefit of optimizing intra-node collectives using POSIX shared memory for synchronization and data sharing. We implement several collectives using basic primitives or steps as building blocks. Key components of our implementation include a dedicated intra -node collectives layer, careful layout of the data structures, as well as optimizations to exploit the memory hierarchy to balance parallelism and latencies of data movement. A comparison of our implementation on top of MPICH shows significant performance speedups with respect to the original MPICH implementation, MVAPICH, and OpenMPI.
更多
查看译文
关键词
Synchronization,Optimization,Layout,Lead,Data structures,Concurrent computing,Topology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要