Composing Loop-carried Dependence with Other Loops

arxiv(2021)

引用 0|浏览2
暂无评分
摘要
Sparse fusion is a compile-time loop transformation and runtime scheduling implemented as a domain-specific code generator. Sparse fusion generates efficient parallel code for the combination of two sparse matrix kernels where at least one of the kernels has loop-carried dependencies. Available implementations optimize individual sparse kernels. When optimized separately, the irregular dependence patterns of sparse kernels create synchronization overheads and load imbalance, and their irregular memory access patterns result in inefficient cache usage, which reduces parallel efficiency. Sparse fusion uses a novel inspection strategy with code transformations to generate parallel fused code for sparse kernel combinations that is optimized for data locality and load balance. Code generated by Sparse fusion outperforms the existing implementations ParSy and MKL on average 1.6X and 5.1X respectively and outperforms the LBC and DAGP coarsening strategies applied to a fused data dependence graph on average 5.1X and 7.2X respectively for various kernel combinations.
更多
查看译文
关键词
other loops,dependence,loop-carried
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要