A run-time optimization approach for reducing data movements using locality-aware searching

The Journal of Supercomputing(2014)

引用 0|浏览36
暂无评分
摘要
The CPU–GPU communication bottleneck limits the performance improvement of GPU applications in heterogeneous GPGPU systems and usually is handled by data reuse optimization. This paper analyzes data reuse through DAG abstraction and obtains rules showing that the run-time data reuse optimization can effectively relieve the bottleneck. Based on the rules, this paper proposes a run-time optimization framework for data reuse, called R-Tracker. The R-Tracker uses locality-aware searching approach to handle reuses. It can not only low costly implement the data reuse optimization but also effectively implement the searching, the data transfers, and the GPU computation concurrently. R-Tracker relaxes the constraints that are required in compiler-based approaches and thus achieves better reuse effect. The experimental results show that R-Tracker improves the performance by 1.77–16.42 % over compiler-based approach OpenMPC and 1.40–8.39 % over CGCM in single-node execution, and 48.78–60 % over CGCM in multi-node execution.
更多
查看译文
关键词
CPU-GPU,Run-time optimization,Dynamic searching,Data reuse
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要