An efficient iterative graph data processing framework based on bulk synchronous parallel model: An efficient iterative graph data processing framework

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2020)

引用 7|浏览32
暂无评分
摘要
Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel-like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub-graph computational tasks. We also re-design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real-world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel-like system) by 58%-69%, and achieves 10x-17x performance improvement over Hadoop.
更多
查看译文
关键词
bulk synchronous parallel model,graph data processing,graph partition,global synchronization,MapReduce
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要