Implementation of GraphFrames-Based Parallelized Label Propagation Algorithm in Clusters

2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)(2022)

引用 0|浏览4
暂无评分
摘要
In the era of big data, the number of network users has exploded, the number of network nodes has increased, and the association relationships between nodes have become more intricate. Ordinary university students who lack a big data experimental environment have been unable to use the traditional label propagation algorithm to deal with large-scale complex network data efficiently. To solve these problems, this paper proposes a parallelized label propagation algorithm based on GraphFrames. Firstly, a multi-node big data cluster environment is built by using the existing computer room resources of universities, and GraphFrames is used to parallelize the label propagation algorithm in the cluster environment. Experiments show that the parallelized label propagation algorithm based on GraphFrames can easily cope with large-scale complex networks with millions of data nodes. The relationship between the running time of the algorithm and the number of nodes in the cluster is explored by varying the number of nodes in the cluster; In terms of the community division effect of the algorithm, the F _Measure value of the large-scale complex network with one million levels can be stably maintained at about 60%, and the F _Measure value of the small-scale real social network is improved by 20% compared with other traditional community discovery algorithm.
更多
查看译文
关键词
Complex networks,parallel graph computing,big data clusters,GraphFrames,Label Propagation Algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要