A Data Locality-Aware Design Framework For Reconfigurable Sparse Matrix-Vector Multiplication Kernel

ICCAD(2016)

引用 5|浏览47
暂无评分
摘要
Sparse matrix-vector multiplication (SOW) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for CPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because the efficiency of SpMV kernel is greatly constrained by the limited memory bandwidth and irregular data access patterns. In this work, we propose a. data locality-aware design framework for FPCA-based SpMV acceleration. We first include the hardware constraints in sparse matrix compression at software level to regularize the memory allocation and accesses. Moreover, a distributed architecture composed of processing elements is developed to improve the computation parallelism. We implement the reconfigurable SpMV kernel OH Convey HC-2(ex) and conduct the evaluation by using the University of Florida sparse matrix collection. The experiments demonstrate an average computational efficiency of 48.2%, which is a lot better than those of CPU and CPU implementations. Our FPGA-based kernel has a comparable runtime as CPU, and achieves 2.1x reduction than CPU. Moreover, our design obtains substantial saving in energy consumption, say, 9.3x and 5.6x better than the implementations on CPU and CPU, respectively.
更多
查看译文
关键词
data locality-aware design,reconfigurable sparse matrix-vector multiplication kernel,computational kernel,software libraries,MKL library,cuSPARSE library,CPU library,floating-point performance,SpMV kernel,data access patterns,FPGA-based SpMV acceleration,hardware constraints,sparse matrix compression,memory allocation,memory accesses,distributed architecture,computation parallelism,reconfigurable SpMV kernel,Convey HC-2ex,University of Florida sparse matrix collection,CPU implementations,GPU implementations,FPGA-based kernel,energy consumption
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要