Energy-Efficient Sparse Matrix Autotuning with CSX -- A Trade-off Study

Parallel and Distributed Processing Symposium Workshops & PhD Forum(2013)

引用 6|浏览0
暂无评分
摘要
In this paper, we apply a method for extracting a running power estimate of applications from hardware performance counters, producing power/time curves which can be integrated over particular intervals to estimate the energy consumption of individual application stages. We use this method to instrument executions of a conjugate gradient solver, to examine the energy and performance impacts of applying the Compressed Sparse eXtended (CSX) and classic Compressed Sparse Row (CSR) matrix compression methods to sparse linear systems from different application areas. The CSX format requires a preprocessing stage which identifies and exploits a range of matrix substructures, incurring a one-time cost which can facilitate more effective sparse matrix-vector multiplication (SpMV). As this numerical kernel is the primary performance bottleneck of conjugate gradient solvers, we take the approach of isolating the energy cost of preprocessing from a short sample of application iterations, obtaining measurements which enlighten the choice of which compression scheme is more appropriate to the input data. We examine the impact variable degrees of parallelism, processor clock frequency, and Hyper threading have on this trade-off. Our results include comparisons of empirically obtained results from all combinations of up to 8 threads on 4 hyper threaded cores, 3 clock frequencies, and 5 sample application matrices. We assess program-hardware interactions with views to structural properties of the data and hardware architectural features, and evaluate the approach with respect to integrating the energy instrumentation with present automatic performance tuning. Results show that our method is sufficiently precise to identify non-trivial tradeoffs in the parameter space, and may become suitable for a run-time automatic tuning scheme by applying a faster preprocessing mode of CSX.
更多
查看译文
关键词
primary performance bottleneck,energy instrumentation,present automatic performance tuning,energy consumption,performance impact,energy cost,individual application stage,application iteration,hardware performance counter,different application area,trade-off study,energy-efficient sparse matrix autotuning,hardware,sparse matrices,energy,power dissipation,matrix multiplication,radiation detectors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要