CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming

Xinyao Yi, David Stokes, Yonghong Yan,Chunhua Liao

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2021)

引用 3|浏览6
暂无评分
摘要
Programming to achieve high performance for NVIDIA GPUs using CUDA has been known to be challenging. A GPU has hundreds or thousands of cores that a program must exhibit sufficient parallelism to achieve maximum GPU utilization. A system with GPU accelerators has a heterogeneous and deep memory system that programmers must effectively and correctly use to fully take advantage of the GPU’s parallel...
更多
查看译文
关键词
Graphics processing units,Systems architecture,Benchmark testing,Parallel processing,Tools,Performance analysis,Complexity theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要