Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environment

Han Jiao,Jilin Zhang, Tomohiro Suzuki

THE PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, HPC ASIA 2024(2024)

引用 0|浏览0
暂无评分
摘要
The primary targets for improving efficiency for large-scale matrix factorization are reducing synchronization, addressing the overlap in communication and computation, and improving load balance. In recent years, tiled algorithms with task parallelism in multicore shared memory systems have become well-established as efficient methods for conducting fine-grained computations on smaller tiles. Moreover, they provide flexible execution orders for a runtime system in many situations. However, traditional hybrid programs with MPI and OpenMP for distributed memory systems use a fork-join model for multi-threads in each process, which leads to thread-parallel computing tasks interchange with sequential communication tasks. In this paper, we incorporate task parallelism and low-rank approximation into a hybrid task-based Cholesky factorization in a distributed environment and propose some low-rank variants. We evaluate the performance of our programs on both full-rank inputs and low-rank inputs and report the pros and cons of the proposed programs.
更多
查看译文
关键词
matrix factorization,task parallelism,MPI plus OpenMP,low-rank approximation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要