An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory

ArXiv(2022)

引用 0|浏览0
暂无评分
摘要
—Unified Virtual Memory (UVM) improves GPU’s programmability by enabling on-demand data movement between CPU memory and GPU memory. Thanks to this emerging feature, GPUs become more ubiquitous in systems ranging from servers to data centers, and there has been an increasing trend of adopting GPUs for large-scale and general-purpose applications. However, this trend soon creates a dilemma that the limited capacity of the GPU device memory is oversubscribed by the ever-growing application working set. Oversubscription overhead becomes a major performance bottleneck for data-intensive workloads running on GPU with UVM. This paper proposes a novel intelligent framework for over- subscription management in CPU-GPU UVM. We analyze the current rule-based methods of GPU memory oversubscription with unified memory, and the current learning-based methods for other computer architectural components. We then identify the performance gap between the existing rule-based methods and the theoretical upper bound. We also identify the advantages of applying machine intelligence and the limitations of the existing learning-based methods. This paper proposes a novel intelligent framework for oversubscription management in CPU-GPU UVM. It consists of an access pattern classifier followed by a pattern-specific Transformer-based model using a novel loss function aiming for reducing page thrashing. A policy engine is designed to leverage the model’s result to perform accurate page prefetching and pre-eviction. We evaluate our intelligent framework on a set of 11 memory-intensive benchmarks from popular benchmark suites. Our solution outperforms the state- of-the-art (SOTA) methods for oversubscription management, reducing the number of pages thrashed by 64.4% under 125% memory oversubscription compared to the baseline, while the SOTA method reduces the number of pages thrashed by 17.3%. Our solution achieves an average IPC improvement of 1.52X under 125% memory oversubscription, and our solution achieves an average IPC improvement of 3.66X under 150% memory oversubscription. Our solution outperforms the existing learning- based methods for page address prediction, improving top-1 accuracy by 6.45% (up to 41.2%) on average for a single GPGPU workload, improving top-1 accuracy by 10.2% (up to 30.2%) on average for multiple concurrent GPGPU workloads.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要