Vectorized Parallel Sparse Matrix-Vector Multiplication In Petsc Using Avx-512

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING(2018)

引用 23|浏览38
暂无评分
摘要
Emerging many-core CPU architectures with high degrees of single-instruction, multiple data (SIMD) parallelism promise to enable increasingly ambitious simulations based on partial differential equations (PDEs) via extreme-scale computing. However, such architectures present several challenges to their efficient use. Here, we explore the efficient implementation of sparse matrix-vector (SpMV) multiplications-a critical kernel for the workhorse iterative linear solvers used in most PDE-based simulations-on recent CPU architectures from Intel as well as the second-generation Knights Landing Intel Xeon Phi, which features many CPU cores, wide SIMD lanes, and on-package high-bandwidth memory. Traditional SpMV algorithms use compressed sparse row storage format, which is a hindrance to exploiting wide SIMD lanes. We study alternative matrix formats and present an efficient optimized SpMV kernel, based on a sliced ELLPACK representation, which we have implemented in the PETSc library. In addition, we demonstrate the benefit of using this representation to accelerate preconditioned iterative solvers in realistic PDE-based simulations in parallel.
更多
查看译文
关键词
parallel SpMV, PETSc, vectorization, many-core, Xeon Phi
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要