A detailed GPU cache model based on reuse distance theory

Cedric Nugteren,Gert-Jan van den Braak,Henk Corporaal,Henri E. Bal

High Performance Computer Architecture（2014）

引用 143|浏览173

暂无评分

摘要

As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimising cache locality system-atically requires insight into and prediction of cache behaviour. On sequential processors, stack distance or reuse distance theory is a well-known means to model cache behaviour. However, it is not straightforward to apply this theory to GPUs, mainly because of the parallel execution model and fine-grained multi-threading. This work extends reuse distance to GPUs by modelling: (1) the GPU's hierarchy of threads, warps, threadblocks, and sets of active threads, (2) conditional and non-uniform latencies, (3) cache associativity, (4) miss-status holding-registers, and (5) warp divergence. We implement the model in C++ and extend the Ocelot GPU emulator to extract lists of memory addresses. We compare our model with measured cache miss rates for the Parboil and PolyBench/GPU benchmark suites, showing a mean absolute error of 6% and 8% for two cache configurations. We show that our model is faster and even more accurate compared to the GPGPU-Sim simulator.

查看译文

关键词

C++ language,benchmark testing,cache storage,graphics processing units,multi-threading,storage allocation,C++ language,GPU cache model,Ocelot GPU emulator,Parboil benchmark suites,PolyBench/GPU benchmark suites,active thread hierarchy,cache associativity,cache behaviour prediction,cache configurations,cache locality optimisation,cache miss rates,conditional nonuniform latencies,fine-grained multithreading,graphics processing units,mean absolute error,memory address list extraction,miss-status holding-registers,parallel execution model,reuse distance theory,sequential processors,stack distance,thread hierarchy,threadblock hierarchy,warp divergence,warp hierarchy

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要