Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)(2016)

引用 4|浏览60
暂无评分
摘要
As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides presenting the suitable hybrid domain decomposition methodology and taking proper usage of tuning techniques for both the CPU and GPU parts, we further propose a novel GPU tuning technique, namely the customizable data caching mechanism with thread warp rescheduling scheme, which is specifically designed for the Euler solver. Combining all the optimizing approaches together, remarkable performance boost has been achieved on mainstream GPU architectures including Tesla Fermi C2050, K20×, K40 and K80. Especially, on the latest Tesla K80, we demonstrate a 31.64× speedup over the performance of 12-core E5-2697 CPU. In addition, based on a hybrid CPU-GPU node with two 12-core E5-2697 CPUs and two Tesla K80 GPUs, a sustained double-precision performance of 1.04 Tflops (16% of the peak) is achieved, which is remarkably higher than the efficiency of similar optimizing tasks based on heterogeneous platforms (strictly less than 10%, as demonstrated in the related work). In addition, a nearly linear weak scaling efficiency is achieved which demonstrate the effectiveness of our domain decomposition method.
更多
查看译文
关键词
domain decomposition method,weak scaling efficiency,thread warp rescheduling scheme,customizable data caching mechanism,GPU tuning technique,3D atmospheric Euler solver,CPU-GPU platforms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要