谷歌浏览器插件
订阅小程序
在清言上使用

Bs-SpMM:Accelerate Sparse Matrix-Matrix Multiplication by Balanced Split Strategy on the GPU.

INFOCOM Workshops(2023)

引用 0|浏览3
暂无评分
摘要
Sparse Matrix-Matrix Multiplication(SpMM) is a commonly utilized operation in various domains, particularly in the increasingly popular Graph Neural Networks(GNN) framework. The current GPU-based SpMM kernel mainly adopts the row splitting strategy, in which a warp in the GPU processes one row of a sparse matrix. However, due to the uneven distribution of non-zero elements in sparse matrices, this row-splitting method often suffers from two problems. 1) Load imbalance among warps 2) When dealing with long rows, a warp's parallel efficiency is low. We propose a balanced split strategy SpMM algorithm named Bs-SpMM. When storing sparse matrices, we use “part” instead of “row” as the granularity just as int8 is more efficient than int16. Its advantage is to ensure that the basic work task size is within a certain range and that there are enough threads in the warp to parallelize all non-zero tasks. When computing tasks on the GPU, the parallel parts are processed by balance, thus avoiding the problem of large differences in the number of non-zero elements from row to row. This approach naturally alleviates the load imbalance problem among warps. When doing memory transactions on the GPU, the temporary space required by the part is small enough that it can always be stored in on-chip shared memory, which will ensure better data locality. Non-zero elements in part are from the same row, which continues to keep coalesced memory access to dense matrice. On Nvidia Rtx 3070 GPU, our experimental results based on SNAP Matrix Collection show that Bs-SpMM achieves an average speedup of 1.40x and 1.49x compared to Nvidia cuSPARSE and state-of-the-art GE-SpMM, respectively.
更多
查看译文
关键词
Sparse matrix,GPU,balanced split strategy,SpMM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要