Vectorized Parallel Sparse Matrix-Vector Multiplication In Petsc Using Avx-512

Hong Zhang,Richard Tran Mills,Karl Rupp,Barry F. Smith

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING（2018）

引用 23|浏览38

暂无评分

摘要

Emerging many-core CPU architectures with high degrees of single-instruction, multiple data (SIMD) parallelism promise to enable increasingly ambitious simulations based on partial differential equations (PDEs) via extreme-scale computing. However, such architectures present several challenges to their efficient use. Here, we explore the efficient implementation of sparse matrix-vector (SpMV) multiplications-a critical kernel for the workhorse iterative linear solvers used in most PDE-based simulations-on recent CPU architectures from Intel as well as the second-generation Knights Landing Intel Xeon Phi, which features many CPU cores, wide SIMD lanes, and on-package high-bandwidth memory. Traditional SpMV algorithms use compressed sparse row storage format, which is a hindrance to exploiting wide SIMD lanes. We study alternative matrix formats and present an efficient optimized SpMV kernel, based on a sliced ELLPACK representation, which we have implemented in the PETSc library. In addition, we demonstrate the benefit of using this representation to accelerate preconditioned iterative solvers in realistic PDE-based simulations in parallel.

查看译文

关键词

parallel SpMV, PETSc, vectorization, many-core, Xeon Phi

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要