An Input-Adaptive And In-Place Approach To Dense Tensor-Times-Matrix Multiply

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis Austin Texas November, 2015(2015)

引用 82|浏览111
暂无评分
摘要
This paper describes a novel framework, called INTENsLI ("intensely"), for producing fast single-node implementations of dense tensor-times-matrix multiply (TIM) of arbitrary dimension. Whereas conventional implementations of TIM rely on explicitly converting the input tensor operand into a matrix in order to be able to use any available and fast general matrix-matrix multiply (GEMM) implementation our framework's strategy is to carry out the TIm avoiding this copy. As the resulting implementations expose tuning parameters, this paper also describes a heuristic empirical model for selecting an optimal configuration based on the TTM's inputs. When compared to widely used single node TIM implementations that are available in the 'TENSOR TOOLBOX and CYci.oEs Tensor Framework (CTF), TNTENSLI'S in-place and input-adaptive TTm implementations achieve 4x and 13 x speedups, showing CEmm-like performance on a variety of input sizes.
更多
查看译文
关键词
Multilinear algebra,tensor operation,code generation,offline autotuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要