Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero Overhead Execution

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2022)

引用 8|浏览13
暂无评分
摘要
GPU accelerators are ubiquitous in modern HPC systems. To program them, users have the choice between vendor-specific, native programming models, such as CUDA, which provide simple parallelism semantics with minimal runtime support, or portable alternatives, such as OpenMP, which offer rich parallel semantics and feature an extensive runtime library to support execution. While the operations of such a runtime can easily limit performance and drain resources, it was to some degree regarded an unavoidable overhead. In this work we present a co-design methodology for optimizing applications using a specifically crafted OpenMP GPU runtime such that most use cases induce near-zero overhead. Specifically, our approach exposes runtime semantics and state to the compiler such that optimization effectively eliminating abstractions and runtime state from the final binary. With the help of user provided assumptions we can further optimize common patterns that otherwise increase resource consumption. We evaluated our prototype build on top of the LLVM/OpenMP GPU offloading infrastructure with multiple HPC proxy applications and benchmarks. Comparison of CUDA, the original OpenMP runtime, and our co-designed alternative show that, by our approach, performance is significantly improved and resource consumption is significantly lowered. Oftentimes we can closely match the CUDA implementation without sacrificing the versatility and portability of OpenMP.
更多
查看译文
关键词
OpenMP,gpu,offloading,compiler optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要