Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture

Donghyeon Kim, Taehoon Kim, Inyong Hwang, Taehyeong Park,Hanjun Kim,Youngsok Kim,Yongjun Park

2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT（2023）

引用 0|浏览0

暂无评分

摘要

Processing-in-Memory (PIM) is an attractive device that can effectively satisfy the rapidly increasing demands for memory-intensive workloads in emerging application domains, such as deep learning and big data processing. Thanks to the integrated design of the main memory (MRAM) and multiple data processing units (DPUs) on a single chip, the PIM devices can provide massive parallelism from numerous DPUs and the substantial bandwidth between the MRAM and DPUs, thus achieving the high performance for the memory-intensive workloads. However, although the recent PIM architectures, including UPMEM, can efficiently execute a single memory-intensive application, they fail to efficiently orchestrate multiple applications on the multiple DPU resources due to the conservative resource allocation, without a resource monitoring system, and large scheduling granularity. To solve these problems, we propose a novel resource-aware dynamic DPU allocation and workload scheduling framework, called Virtual PIM, for multi-DPU PIM architectures such as UPMEM. The framework initially virtualizes the DPU and MRAM to ensure data consistency in multi-application environments. For dynamic DPU allocation, the Virtual PIM framework continuously gathers resource requests from multiple processes and current DPU occupancy information to estimate the dynamic DPU resource status, irrespective of PIM hardware support. Based on this information, the framework dynamically allocates DPUs and schedules workloads in fine-grained levels with minimum occupancy to maximize total DPU utilization. Our evaluations in real PIM environments demonstrate that Virtual PIM significantly improves system throughput and average normalized turnaround time by up to 4.83x and 3.45x, respectively, compared to the SLURM-based baseline.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要