Design and Implementation of External Storage Large-Scale Graph Computing System.

HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications(2023)

引用 0|浏览8
暂无评分
摘要
With the rise of big data, graph computing has become prevalent in many fields. To effectively address such problems, large-scale graph computing systems have emerged. Most existing systems adopt memory-based computing frameworks. However, the rapid growth of data scale often leads to insufficient memory storage. To tackle these issues, an external storage-based graph computing system called DCGraph has been designed and implemented. The bottleneck of large-scale graph computing systems based on single-machine external storage usually lies in external storage I/O. Hence, DCGraph has been optimized for external storage I/O. In the preprocessing stage, graph data is compressed and transformed into a two-dimensional CSC format. Each block records the offset of its data. This guarantees that data of the same target vertex is continuously stored in external storage within the same block. DCSC format is selected for compressing data blocks with many vertices of zero in-degree. Additionally, a jump-out calculation mode suitable for the CSC format is designed for specific graph calculation algorithms. During data reading, it is determined whether to skip the current record based on its subscript and offset. Selective scheduling is used to skip inactive data blocks during each layer iteration to reduce unnecessary data reading and processing. This approach reduces both the number and total amount of I/Os. Experimental results show that compared to GridGraph, DCGraph achieves a speedup ratio of 1.5 to 2.8 times while reducing the total amount of I/Os in calculation to less than half of that of GridGraph. CCS CONCEPTS
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要