Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

IEEE Transactions on Parallel and Distributed Systems(2014)

引用 167|浏览93
暂无评分
摘要
Graphics processors, or GPUs, have recently been widely used as accelerators in shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite recently improved runtime support for concurrent GPU kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet, a runtime system to improve the throughput of concurrent kernel executions on the GPU. Kernelet embraces transparent memory management and PCI-e data transfer techniques, and dynamic slicing and scheduling techniques for kernel executions. With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices ). Each slice has tunable occupancy to allow co-scheduling with other slices for high GPU utilization. We develop a novel Markov chain-based performance model to guide the scheduling decision. Our experimental results demonstrate up to 31 percent and 23 percent performance improvement on NVIDIA Tesla C2050 and GTX680 GPUs, respectively.
更多
查看译文
关键词
processor scheduling,dynamic scheduling techniques,kernel slicing,dynamic slicing techniques,runtime support,concurrency control,gtx680 gpus,nvidia tesla c2050,storage management,operating system kernels,graphics processing units,total ownership cost,program slicing,concurrent gpu kernel executions,markov processes,shared environments,kernelet,performance modeling,gpgpu,high-throughput gpu kernel executions,performance evaluation,runtime system,pci-e data transfer techniques,markov chain,task scheduling,transparent memory management,suboptimal throughput,markov chain-based performance model,graphics processors,throughput,kernel,instruction sets,memory management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要