Releasing the Potential of Tensor Core for Unstructured SpMM using Tiled-CSR Format

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

Cited 0|Views14
No score
Abstract
The GPU has become a popular platform for AI applications, thanks in part to its Tensor Cores that address performance issues. However, the Sparse Matrix Multiplication (SpMM) kernel has remained a bottleneck despite significant advances in computing power. Due to the hardware mechanism of the Tensor Core, its programming granularity does not match SpMM. In this paper, we analyze the reasons why the unstructured SpMM kernel is not suitable for the Tensor Core, and propose the Tiled Compressed Sparse Row (Tiled-CSR) compression format. To address the issue of low non-zero rates in Tiled-CSR format, we exploit the row shuffle algorithm to improve the utilization of Tensor Cores and enhance computing density. We also utilize adaptive memory access modes and 3D-Grid tiling for the SpMM kernel to reduce memory access latency. The experimental results on NVIDIA A100 GPU with matrices in the Deep Learning Matrix Collection (DLMC) demonstrate that the Tiled-CSR format improves the utilization of Tensor Cores under different sparsity, with a maximum of 3.89x at 50% sparsity and a minimum of 1.82x at 90% sparsity compared to the SR-BCRS format. Additionally, our kernel achieves an average speedup of 1.54x(up to 2.12x) over Magicube.
More
Translated text
Key words
unstructured sparse matrix multiplication,compression format,Tensor Core,bandwidth efficiency
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined