Developing A Multi-Gpu-Enabled Preconditioned Gmres With Inexact Triangular Solves For Block Sparse Matrices

MATHEMATICAL PROBLEMS IN ENGINEERING(2021)

引用 1|浏览1
暂无评分
摘要
Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc's framework. In addition, by developing a distributed block sparse matrixvector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc's structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GM RES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要