Low-synchronization orthogonalization schemes for s-step and pipelined Krylov solvers in Trilinos

Society for Industrial and Applied Mathematics eBooks(2020)

引用 0|浏览1
暂无评分
摘要
We investigate two single-reduce orthogonalization schemes for both s-step and pipelined GMRES. The first is based on classical Gram Schmidt with reorthogonalization (CGS2), and the second on modified Gram Schmidt (MGS). Standard iterated CGS2 requires three global reductions. In standard MGS, the number of global reductions is proportional to the number of vectors against which we are orthogonalizing. In both cases, we can reduce this to a single global reduction, including reorthogonalization for accuracy. Our implementation is based on Trilinos software components, and therefore, is portable to different machine architectures with a single code base. We first demonstrate solver performance on the Intel Haswell nodes of the NERSC Cori Supercomputer. For these experiments, we integrated our solvers into Nalu-wind, a computational fluid dynamics application. At each time step, Nalu uses GMRES with a smoothed aggregation algebraic multigrid (SA-AMG) preconditioner to solve a pressure Poisson linear system. In this experiment, sstep GMRES reduced Nalu's total GMRES solve time by a factor of 1.4×. We then benchmarked the single-reduce orthogonalization schemes on the ORNL Summit supercomputer. In these experiments, our low-synchronization CGS2 and MGS improved the s-step GMRES performance by a factor of 2.4×and 10.1×on 384 NVIDIA V100 GPUs, respectively, while on the IBM Power9 CPUs, they improved the stability of the pipelined GMRES without increasing the iteration time.
更多
查看译文
关键词
krylov solvers,low-synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要