Distributed Exact Structural Clustering on Large Graph

ICPADS(2023)

引用 0|浏览2
暂无评分
摘要
Graph clustering is an important technique to detect community clusters in complex networks. SCAN (Structural Clustering Algorithm for Networks) is a well-studied graph clustering algorithm that has been widely applied over the years. However, the processing time cost of sequential SCAN and its variants cannot be tolerable on large graphs. The existing parallel variants of SCAN are focusing on fully utilizing the computing capacity of multi-core computer architectures and inventing sophisticated optimization techniques on single computing node. As the objects and their relationships in cyberspace are varying over time, the scale of graph data is increasing with high rate. The graph clustering algorithms on single node are facing challenges from limited computing resources, such as computing performance, memory size and storage volume. The distributed processing algorithm is called for processing large graphs. This work presents a distributed structural graph clustering algorithm using Spark. Furthermore, the edge pruning technique and adaptive checking are optimized to improve clustering efficiency. And the label propagation clustering is simplified to reduce the communication cost in the distributed clustering iterations. It also conduct extensive experiments on real-world datasets to testify the efficiency and scalability of the distributed algorithm. Experimental results show that efficient clustering performance can be achieved and it scales well under different settings.
更多
查看译文
关键词
Structural Clustering,Distributed Computing,Spark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要