TwinKernels: an execution model to improve GPU hardware scheduling at compile time.

CGO(2017)

引用 22|浏览121
暂无评分
摘要
As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU’s resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels of TLP due to hardware bottlenecks. Unfortunately, this tolerance is not effectively exploited by the Single Instruction Multiple Thread (SIMT) execution model employed by current GPU compute frameworks. Assuming an SIMT execution model, GPU applications tend to send bursts of memory requests that compete for GPU memory resources. Traditionally, hardware units, such as the wavefront scheduler, are used to manage such requests. However, the scheduler struggles when computational operations are not abundant enough to effectively hide the long latency of memory operations. In this paper, we propose a Twin Kernel Multiple Thread (TKMT) execution model, a compiler-centric solution that improves hardware scheduling at compile time. TKMT better distributes the burst of memory requests in some of the wavefronts through static instruction scheduling. Our results show that TKMT can offer a 12% average improvement over the baseline SIMT implementation on a variety of benchmarks on AMD Radeon systems.
更多
查看译文
关键词
TwinKernels,execution model,GPU hardware scheduling,compile time,parallel threads,thread level parallelism,TLP,GPU peak performance,hardware bottlenecks,single instruction multiple thread,SIMT,GPU applications,GPU memory resources,hardware units,wavefront scheduler,computational operations,twin kernel multiple thread,TKMT execution model,compiler-centric solution,hardware scheduling,AMD Radeon systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要