Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads (Extended Abstract).

SIGMETRICS Performance Evaluation Review(2022)

Cited 0|Views6
No score
Abstract
Hazelwood et al. observed that at Facebook data centers, variations in user activity (e.g. due to diurnal load) resulted in low utilization periods with large pools of idle resources [4]. To make use of these resources, they proposed using machine learning training tasks. Analagous lowutilization periods have also been observed at the scale of individual GPUs when using both GPU-based inference [1] and training [6]. The proposed solution to this latter problem was colocating additional inference or training tasks on a single GPU.We go a step further than these previous studies by considering the GPU at the microarchitectural level rather than treating it as a black box. Broadly, we consider the following question: are current GPU application- and block-level scheduling mechanisms sufficient to guarantee predictable and low turnaround times for latency-sensitive inference requests, while also consistently making use of unoccupied resources for best-effort training tasks? To answer this question, we explore both NVIDIA's concurrency mechanisms and the characteristics of the workload itself. Complicating our analyses, the NVIDIA scheduling hierarchy is proprietary and some mechanisms (e.g., time-slicing) are not well-documented, so their behavior must be reverseengineered from empirical observation.
More
Translated text
Key words
deep learning workloads,nvidia gpus,concurrency mechanisms,deep learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined