Trading Off Memory For Parallelism Quality

semanticscholar(2011)

引用 2|浏览2
暂无评分
摘要
We detail an algorithm implemented in the R-Stream compiler to perform controlled array expansion and conversion to partial single-assignment form, which consists of (1) allowing our automatic code optimizer to selectively ignore false dependences in order to extract a good tradeoff between locality and parallelism, (2) detecting exactly all the causes of semantics violations in the relaxed schedule of the program and (3) incrementally correcting violations by minimal amounts of renaming and expansion. In particular, our algorithm may ignore all false dependences and extract the maximal available parallelism in the program given a limit on the amount of expansion. The spectrum of memory consumption then varies between no expansion and total single assignment, with many steps between those extremes. The exposed parallelism can be incrementally reduced to fit more tightly the number and organization of processing elements available in the targeted hardware, and, by the same token, to reduce the program’s memory footprint. We extend our correction scheme in an iterative algorithm to tailor the mapping of the program for a good tradeoff between parallelism, locality and memory consumption. We demonstrate the power of our technique by optimizing a radar benchmark comprising a sequence of BLAS calls. By applying our technique and optimizing at a global level, we reach significant performance improvements over an implementation based on vendor optimized math library calls. Our technique also has implications on algorithm selection.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要