Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs
HiPC(2013)
摘要
In this paper, we report on the development of an efficient GPU implementation of the Strassen-Winograd matrix multiplication algorithm for matrices of arbitrary sizes. We utilize multi-kernel streaming to exploit concurrency across sub-matrix operations in addition to intra-operation parallelism. We evaluate the performance of the implementation in comparison with CUBLAS-5.0 on Fermi and Kepler GPUs. The experimental results demonstrate the usefulness of Strassen's algorithm for practically relevant matrix sizes on GPUs, with up to 1.27X speedup for single-precision and 1.42X speedup for double-precision floating point computation.
更多查看译文
关键词
matrix multiplication,kepler gpu,intraoperation parallelism,strassen-winograd matrix multiplication algorithm,double-precision floating point computation,graphics processing units,gpu implementation,multikernel streaming,fermi gpu,cublas-5.0,performance evaluation,strassen's algorithm,submatrix operations,floating point arithmetic
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要