Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs

Pai-Wei Lai,Humayun Arafat,Venmugil Elango,Ponnuswamy Sadayappan

HiPC（2013）

引用 16|浏览119

暂无评分

摘要

In this paper, we report on the development of an efficient GPU implementation of the Strassen-Winograd matrix multiplication algorithm for matrices of arbitrary sizes. We utilize multi-kernel streaming to exploit concurrency across sub-matrix operations in addition to intra-operation parallelism. We evaluate the performance of the implementation in comparison with CUBLAS-5.0 on Fermi and Kepler GPUs. The experimental results demonstrate the usefulness of Strassen's algorithm for practically relevant matrix sizes on GPUs, with up to 1.27X speedup for single-precision and 1.42X speedup for double-precision floating point computation.

查看译文

关键词

matrix multiplication,kepler gpu,intraoperation parallelism,strassen-winograd matrix multiplication algorithm,double-precision floating point computation,graphics processing units,gpu implementation,multikernel streaming,fermi gpu,cublas-5.0,performance evaluation,strassen's algorithm,submatrix operations,floating point arithmetic

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要