Sparse Tensor Factorization On Many-Core Processors With High-Bandwidth Memory

2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS)(2017)

引用 41|浏览71
暂无评分
摘要
HPC systems are increasingly used for data intensive computations which exhibit irregular memory accesses, non-uniform work distributions, large memory footprints, and high memory bandwidth demands. To address these challenging demands, HPC systems are turning to many-core architectures that feature a large number of energy-efficient cores backed by high-bandwidth memory. These features are exemplified in Intel's recent Knights Landing many-core processor (KNL), which typically has 68 cores and 16GB of on-package multi-channel DRAM (MCDRAM). This work investigates how the novel architectural features offered by KNL can be used in the context of decomposing sparse, unstructured tensors using the canonical polyadic decomposition (CPD). The CPD is used extensively to analyze large multi-way datasets arising in various areas including precision healthcare, cybersecurity, and e-commerce. Towards this end, we (i) develop problem decompositions for the CPD which are amenable to hundreds of concurrent threads while maintaining load balance and low synchronization costs; and (ii) explore the utilization of architectural features such as MCDRAM. Using one KNL processor, our algorithm achieves up to 1.8x speedup over a dual socket Intel Xeon system with 44 cores.
更多
查看译文
关键词
sparse tensor factorization,many-core processors,high-bandwidth memory,HPC systems,data intensive computations,irregular memory accesses,nonuniform work distributions,large memory footprints,high memory bandwidth demands,many-core architectures,energy-efficient cores,knights landing many-core processor,unstructured tensors,canonical polyadic decomposition,CPD,multiway datasets,precision healthcare,cybersecurity,e-commerce,synchronization costs,load balance,MCDRAM,KNL processor,dual socket Intel Xeon system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要