谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs

HiPC(2013)

引用 16|浏览119
暂无评分
摘要
In this paper, we report on the development of an efficient GPU implementation of the Strassen-Winograd matrix multiplication algorithm for matrices of arbitrary sizes. We utilize multi-kernel streaming to exploit concurrency across sub-matrix operations in addition to intra-operation parallelism. We evaluate the performance of the implementation in comparison with CUBLAS-5.0 on Fermi and Kepler GPUs. The experimental results demonstrate the usefulness of Strassen's algorithm for practically relevant matrix sizes on GPUs, with up to 1.27X speedup for single-precision and 1.42X speedup for double-precision floating point computation.
更多
查看译文
关键词
matrix multiplication,kepler gpu,intraoperation parallelism,strassen-winograd matrix multiplication algorithm,double-precision floating point computation,graphics processing units,gpu implementation,multikernel streaming,fermi gpu,cublas-5.0,performance evaluation,strassen's algorithm,submatrix operations,floating point arithmetic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要