Accelerating OpenSHMEM Collectives Using In-Network Computing Approach

2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)(2019)

引用 2|浏览25
暂无评分
摘要
OpenSHMEM is one of the key programming models for High Performance Computing (HPC) applications with irregular communication patterns. Particularly, it is useful for problems that cannot be decomposed easily such as graph partitioning. The programming model supports Remote Memory Access (RMA), atomics, and collective operations. In this paper, we explore and evaluate the In-network Computing approach for accelerating the OpenSHMEM collective operations, particularly barrier, broadcast, and reduction operations. To achieve acceleration, In-network Computing leverages hardware engines on the networking elements and effective software that can efficiently use these capabilities. We explore the value of this approach for collective operations on the InfiniBand Host Channel Adapters (HCAs) and switches. Particularly, we focus on the recently introduced collective offload feature provided by the Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) TM capability, which accelerates the barriers and reduction operations; the multicast capability accelerates the broadcast collective operation. To leverage the hardware capabilities, we complement it with an effective software stack that includes Hierarchical Collectives (HCOLL) library, and SHARP layer. Our evaluation on Oak Ridge National Laboratory (ORNL)'s Summit system, which is the fastest supercomputer on the June 2019 Top 500 list, show that the hardware and software acceleration in the In-network Computing approach is key for achieving the performance and scalability required for collectives and applications. For a 5120 process OpenSHMEM job, our results show that the barrier operation is 710% faster, broadcast is 370% faster, and reduction operation is 10 times faster when compared with the implementation of collective operations with no acceleration. Further, experiments with a 2D-Heat kernel show that the In-network Computing approach is very effective for realworld applications.
更多
查看译文
关键词
Parallel Programming Models,Collective Operations,HPC,MPI,InfiniBand
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要