Fireiron: A Data-Movement-Aware Scheduling Language for GPUs

PACT '20: International Conference on Parallel Architectures and Compilation Techniques Virtual Event GA USA October, 2020(2020)

引用 13|浏览31
暂无评分
摘要
High GPU performance can only be achieved if a kernel efficiently uses the multi-layered compute and memory hierarchies. For example, accelerators such as NVIDIA ?s Tensor Cores require specific mappings of threads to data that must be considered in data movements to and from registers. Current compilers struggle to match the performance of vendor libraries like cu BLAS , which are developed by experts in assembly. This manual low-level coding is time-consuming and complicates to unlock the full GPU potential, preventing experimentation to achieve even higher performance. In this paper we introduce Fireiron, a scheduling language aimed at performance experts. Fireiron provides high-level abstractions for expressing GPU optimizations that are unavailable to compilers today and which so far must be written in assembly. Our innovation is that both computations and data movements are first class concepts that can be separately mapped to threads, as required for the efficient use of specialized hardware like Tensor Cores. We evaluate Fireiron on three GPU architectures against expert-written advanced matrix multiplications. First, we show that Fireiron schedules are able to express the strategies of these implementations requiring about 6× less lines of code. Second, we show that the code generated by Fireiron schedules outperforms the fastest implementations (provided by cu BLAS ) by more than 2×.
更多
查看译文
关键词
Data Movement, GPU, Optimization, Compilers, Fireiron
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要