Analyzing Data Locality on GPU Caches Using Static Profiling of Workloads.

IEEE Access（2023）

引用 0|浏览3

暂无评分

摘要

The diversity of workloads drives studies to use GPU more effectively to overcome the limited memory of GPUs. Precisely, it is essential to understand and utilize data locality of workloads to utilize the memory and cache efficiently, which is relatively smaller than CPU ' s. It is important to understand GPU memory hierarchy to efficiently use with multi-thread environment. Although there have been previous approaches to analyzing data locality on GPUs, these approaches focused on global memory and L2 cache levels with profiling at thread block levels. Data locality study in warp level in GPU has not been studied much. Especially, the concept of coalescing has been defined but the method of measuring the degree of coalescing has not been discussed. Our study focused on analyzing data locality in L1 cache levels, which is the smallest but fastest in cache level to analyze the impact of data locality. To achieve this analysis, our study profiles data locality in warp level, which is smallest segment in GPU thread groups. This paper introduces a novel perspective by introducing a quantitative measure for coalescing alongside static profiling of data locality. Furthermore, it offers a means of refining locality estimates by scrutinizing access patterns of L1 cache. To substantiate our approach, our study validates the estimated data locality against a range of real-world GPU benchmarks, including Rodina and Polybench. Through empirical experimentation, our results reveal a substantial correlation between the metrics of data locality and cache utilization, affirming the efficacy of our proposed method.

查看译文

关键词

Data locality, GPU cache, GPU profiling, GPGPU workload analysis, PTX code

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要