Efficient Intranode Communication in GPU-Accelerated Systems

Parallel and Distributed Processing Symposium Workshops & PhD Forum(2012)

引用 23|浏览0
暂无评分
摘要
Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly reduce the cost of intranode communication involving one or more GPUs. Experiment results show an up to 2x increase in bandwidth, resulting in an average of 4.3% improvement to the total execution time of a halo exchange benchmark.
更多
查看译文
关键词
extra copy operation,efficient intranode communication,popular mpi runtime system,intranode communication,accelerator memory,halo exchange benchmark,experiment result,gpu device memory,total execution time,gpu-accelerated systems,memory space,current implementation,mpi,computer architecture,bandwidth,message passing,programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要