Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation

ICS（2021）

引用 5|浏览5

暂无评分

摘要

ABSTRACTLoop tiling is a key high-level transformation which is known to maximize locality in loop intensive programs. It has been successfully applied to a number of applications including tensor contractions, iterative stencils and machine learning. This technique has also been extended to a wide variety of computational domains and architectures. The performance achieved with this critical transformation largely depends on a set of inputs given, the tile sizes, due to the complex trade-off between locality and parallelism. This problem is exacerbated in GPGPU architectures due to limited hardware resources such as the available shared-memory. In this paper we present a new technique to compute resource conscious tile sizes for affine programs. We use Integer Linear Programming (ILP) constraints and objectives in a cross-compiler fashion to faithfully and effectively mimic the transformations applied in a polyhedral GPU compiler (PPCG). Our approach significantly reduces the need for experimental auto-tuning by generating only two tile size configurations that achieve strong out-of-the-box performance. We evaluate the effectiveness of our technique using the Polybench benchmark suite on two GPGPUs, an AMD Radeon VII and an NVIDIA Tesla V100, using OpenCL and CUDA programming models. Experimental validation reveals that our approach achieves nearly 75% of the best empirically found tile configuration across both architectures.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要