Taking GPU Programming Models to Task for Performance Portability
arxiv(2024)
Abstract
Ensuring high productivity in scientific software development necessitates
developing and maintaining a single codebase that can run efficiently on a
range of accelerator-based supercomputing platforms. While prior work has
investigated the performance portability of a few selected proxy applications
or programming models, this paper provides a comprehensive study of a range of
proxy applications implemented in the major programming models suitable for
GPU-based platforms. We present and analyze performance results across NVIDIA
and AMD GPU hardware currently deployed in leadership-class computing
facilities using a representative range of scientific codes and several
programming models – CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL. Based
on the specific characteristics of applications tested, we include
recommendations to developers on how to choose the right programming model for
their code. We find that Kokkos, RAJA, and SYCL in particular offer the most
promise empirically as performance portable programming models. These results
provide a comprehensive evaluation of the extent to which each programming
model for heterogeneous systems provides true performance portability in
real-world usage.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined