Optimizing Non-commutative Allreduce Over Virtualized, Migratable MPI Ranks

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2022)

引用 0|浏览10
暂无评分
摘要
Dynamic load balancing can be difficult for MPI-based applications. Application logic and algorithms are often rewritten to enable dynamic repartitioning of the domain. An alternative approach is to virtualize the MPI ranks as threads-instead of operating system processes- and to migrate threads around the system to balance the computational load. Adaptive MPI is one such implementation. It supports virtualization of MPI ranks as migratable user-level threads. However, this migratability itself can introduce new performance overheads to applications. In this paper, we identify non-commutative reduction operations as problematic for any runtime supporting either user-defined initial mapping of ranks or dynamic migration of ranks among the cores or nodes of a machine. We investigate the challenges associated with supporting efficient non-commutative reduction operations, and explore algorithmic alternatives such as recursive doubling and halving in combination with a novel adaptive message combining technique. We explore tradeoffs in the different algorithms for various message sizes and mappings of ranks to cores, demonstrating our performance improvements using microbenchmarks.
更多
查看译文
关键词
AMPI,MPI,Collectives,Allreduce,Communication Optimizations,Charm++
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要