Scalable Sparse Tensor Decompositions In Distributed Memory Systems

SC(2015)

引用 136|浏览68
暂无评分
摘要
We investigate an efficient parallelization of the most common iterative sparse tensor decomposition algorithms on distributed memory systems. A key operation in each iteration of these algorithms is the matricized tensor times Khatri-Rao product (MTTKRP). This operation amounts to element-wise vector multiplication and reduction depending on the sparsity of the tensor. We investigate a fine and a coarse-grain task definition for this operation, and propose hypergraph partitioning-based methods for these task definitions to achieve the load balance as well as reduce the communication requirements. We also design a distributed memory sparse tensor library, HyperTensor, which implements a well-known algorithm for the CANDECOMP/ PARAFAC (CP) tensor decomposition using the task definitions and the associated partitioning methods. We use this library to test the proposed implementation of MTTKRP in CP decomposition context, and report scalability results up to 1024 MPI ranks. We observed up to 194 fold speedups using 512 MPI processes on a well-known real world data, and significantly better performance results with respect to a state of the art implementation.
更多
查看译文
关键词
sparse tensor decomposition,distributed memory system,parallelization,matricized tensor times Khatri-Rao product,MTTKRP,hypergraph partitioning,load balancing,CANDECOMP-/PARAFAC tensor decomposition,CP tensor decomposition,task definition,message passing interface,MPI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要