Extending $$\tau $$ τ -Lop to model MPI blocking primitives on shared memory

The Journal of Supercomputing(2022)

引用 0|浏览6
暂无评分
摘要
MPI communication optimization is essential for high-performance applications. The communication performance models have made some achievements in improving the efficiency of collective algorithms and optimizing communication scheduling. Instead of using hardware-related parameters such as bandwidth and latency for communication modeling, recent studies have focused more on software models, which simplify modeling by representing transmission as a sequence of implicit transfers. As a state-of-the-art software model, $$\tau $$ -Lop adopts the concept of concurrent transfers for modeling on multiple platforms. However, $$\tau $$ -Lop only focuses on the entire system, not the single MPI primitive. This makes $$\tau $$ -Lop difficult to apply in systems where processes have different cost. The demand for high-precision concurrent communication modeling is increasing, thus, we extend $$\tau $$ -Lop to model MPI primitives to handle this situation and model more, such as asynchronous communication. The modeling accuracy is improved after considering factors such as concurrent transmission, waiting time, communication ends, channels, and protocols. In the test of point-to-point and concurrent communication, the relative error of our model is less than 40% and the accuracy is more than 100% higher than the original $$\tau $$ -Lop model in most cases, which means that our work can be used for practical optimization.
更多
查看译文
关键词
Message passing interface, Performance analysis, Parallel performance models, Concurrent transmission
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要