Scalable Graph Convolutional Network based Link Prediction on a Distributed Graph Database Server

Anuradha Karunarathna, Dinika Senarath, Shalika Madhushanki,Chinthaka Weerakkody,Miyuru Dayarathna,Sanath Jayasena,Toyotaro Suzumura

2020 IEEE 13th International Conference on Cloud Computing (CLOUD)(2020)

引用 3|浏览30
暂无评分
摘要
Graph Convolutional Networks (GCN) have become a popular means of performing link prediction due to the high accuracy offered by them. However, scaling such link prediction into large graphs of billions of vertices and edges with rich types of attributes is a significant issue to be addressed due to the storage and computation limitations of the machines. In this paper we present a scalable link prediction approach which conducts GCN training and link prediction on top of a distributed graph database server called JasmineGraph. We partition graph data and persist them in multiple workers. We implement parallel graph node embedding generation using GraphSAGE algorithm in multiple workers. Our approach avoids facing performance bottlenecks in GCN training using an intelligent scheduling algorithm. We show our approach scales well with an increasing number of partitions (2,4,8, and 16) using four real world data sets called Twitter, Amazon, Reddit, and DBLP-V11. JasmineGraph was able to train a GCN from the largest dataset DBLP-V11 (> 9.3GB) in 11 hours and 40 minutes time using 16 workers on a single server while the original GraphSAGE implementation could not process it at all. The original GraphSAGE implementation processed the second largest dataset Reddit in 238 minutes while JasmineGraph took only 100 minutes on the same hardware with 16 workers leading to 2.4 times improved performance.
更多
查看译文
关键词
Machine Learning,Graph databases,Database management,Distributed databases,Graph theory,Graph Convolutional Neural Networks,Deep learning,Link Prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要