Compilation of Modular and General Sparse Workspaces
arxiv(2024)
摘要
Recent years have seen considerable work on compiling sparse tensor algebra
expressions. This paper addresses a shortcoming in that work, namely how to
generate efficient code (in time and space) that scatters values into a sparse
result tensor. We address this shortcoming through a compiler design that
generates code that uses sparse intermediate tensors (sparse workspaces) as
efficient adapters between compute code that scatters and result tensors that
do not support random insertion. Our compiler automatically detects sparse
scattering behavior in tensor expressions and inserts necessary intermediate
workspace tensors. We present an algorithm template for workspace insertion
that is the backbone of our code generation algorithm. Our algorithm template
is modular by design, supporting sparse workspaces that span multiple
user-defined implementations. Our evaluation shows that sparse workspaces can
be up to 27.12× faster than the dense workspaces of prior work. On the
other hand, dense workspaces can be up to 7.58× faster than the sparse
workspaces generated by our compiler in other situations, which motivates our
compiler design that supports both. Our compiler produces sequential code that
is competitive with hand-optimized linear and tensor algebra libraries on the
expressions they support, but that generalizes to any other expression. Sparse
workspaces are also more memory efficient than dense workspaces as they
compress away zeros. This compression can asymptotically decrease memory usage,
enabling tensor computations on data that would otherwise run out of memory.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要