Efficient and Simplified Parallel Graph Processing over CPU and MIC

2015 IEEE International Parallel and Distributed Processing Symposium(2015)

引用 35|浏览76
暂无评分
摘要
Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on anode with such an architecture to achieve high parallel efficiency's a major challenge. In this paper, we focus on developing system for heterogeneous graph processing, which is able to utilize both a many-core Xeon Phi and a multi-core CPU ozone node. We propose a simple programming API with unintuitive interface for expressing SIMD parallelism. We develop efficient techniques for supporting our high-level API, focusing on exploiting wide SIMD lanes, massive number of cores, and partitioning of the work across CPU and accelerator, while handling the irregularity of graph applications. The components of our runtime system include a condensed static memory buffer, which supports efficient message insertion and SIMD message reduction while keeping memory requirements low, and specifically formic, a pipelining scheme for efficient message generation by avoiding frequent locking operations. Besides, a hybrid graph partitioning module is able to effectively partition the workload between the CPU and the MIC, ensuring balanced workload and low communication overhead. The main observations from our experimental evaluation using five popular applications are: formic executions, pipelining scheme is up to 3.36x faster than naive approach using locking based message generation, and the speedup over OpenMP ranges from 1.17 to 4.15. Heterogeneous-MIC execution achieves a speedup of up to 1.41 over the better of the CPU-only and MIC-only executions.
更多
查看译文
关键词
CPU,Intel MIC,Graph Processing,Programming Model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要