Independent Forward Progress of Work-groups

2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)(2020)

Cited 8|Views109
No score
Abstract
GPUs have evolved from providing highly-constrained programmability for a single kernel to using pre-emption to ensure independent forward progress for concurrently executing kernels. However, modern GPUs do not ensure independent forward progress for kernels that use fine-grain synchronization to coordinate inter-work-group execution. Enabling independent forward progress among work-groups (WGs) is challenging as pre-empted kernels may be rescheduled with fewer hardware resources. This can lead to oversubscribed execution scenarios that deadlock current hardware even for correctly written code. Prior work addresses this problem by requiring programmers to specify resource requirements and assuming static resource allocation, which adds scheduling constraints and reduces portability. We propose a family of novel hardware approaches — trading off hardware complexity for performance — that provide independent forward progress in the presence of fine-grain inter-WG synchronization and dynamic resource allocation. Additionally, we propose new waiting atomic instructions compatible with proposed C++ 20 extensions. Our final design, Autonomous Work-Groups (AWG), uses hints from regular and waiting atomics to cooperatively schedule WGs within a kernel, improving efficiency and virtualizing hardware resources. In non-oversubscribed scenarios, AWG outperforms a busy-waiting baseline (which deadlocks in oversubscribed scenarios) by 12x on average for benchmarks that use different mutexes and barriers for fine-grained, WG granularity synchronization. Furthermore, AWG outperforms other solutions that do not deadlock in the oversubscribed case, such as fixed-interval round-robin context switching or naively extending monitor/mwait to GPUs, by 2.6x and 2.2x, respectively.
More
Translated text
Key words
progress,independent,work-groups
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined