Gvprof: A Value Profiler For Gpu-Based Clusters

The International Conference for High Performance Computing, Networking, Storage, and Analysis(2020)

引用 9|浏览22
暂无评分
摘要
GPGPUs arc widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important to obtain "bare-metal" performance on GPU-based dusters. In this paper, we describe the design and implementation of GVPROF, the first value profiler that pinpoints value-related inefficiencies in applications running on NVIDIA GPU-based clusters. The novelly of GVPROF resides in its ability to detect temporal and spatial value redundancies, which provides useful information to guide code optimization. GVPROF can monitor production multi-node multi-GPU executions in clusters. Our experiments with well-known GPU benchmarks and HPC applications show that GVPROF incurs acceptable overhead and scales to large executions. Using GVPROF, we optimized several IIPC and machine learning workloads on one NVIDIA V100 GPU. In one case study of LAMMPS, optimizations based on information from GVProf led to whole-program speedups ranging from I.37x on a single CPU to 1.08x on 64 GPUs.
更多
查看译文
关键词
High performance computing,Performance analysis,Parallel programming,Supercomputers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要