hcaPCG: A Heterogeneous and communication-avoid PCG with Jacobi preconditioner on SW26010-Pro architecture.

Min Tian, Yue Liu,Li Wang, Yanyan Chen, Qi Liu,Jingshan Pan

International Conference on Parallel and Distributed Systems(2023)

引用 0|浏览0
暂无评分
摘要
Due to its efficiency and versatility, the preconditioned conjugate gradient algorithm has long been a staple in the realm of iterative linear system solvers. In this paper, we proposed an optimized preconditioned conjugate gradient algorithm tailored for the SW26010-Pro manycore processor, The main work includes: Optimizing data block sizes based on the processor’s storage structure; combining thread-level and data-level parallelism, utilizing manual SIMD to improve efficiency; optimizing the memory access pattern by employing direct memory access and overlapping of computation and communication; employing a shared-memory approach to store long vectors across all cores. Furthermore, we design an accelerated algorithm for reduction operations, to avoid data communication. Experimental results show that the hcaPCG yields up to 28.1× speedups on average compared to the original implementation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要