Balanced parallel triangle enumeration with an adaptive algorithm

DISTRIBUTED AND PARALLEL DATABASES(2024)

引用 0|浏览3
暂无评分
摘要
Triangle enumeration is a foundation brick for solving harder graph problems related to social networks, the Internet and transportation, to name a few applications. This problem is well studied in the theory literature, but remains an open problem with big data. In this paper, we defend the idea of solving triangle enumeration with SQL queries evaluating the steps of a new adaptive algorithm with linear speedup. Such SQL approach provides scalability beyond RAM limits, automatic parallel processing and more importantly: linear speedup as more machines are added. We present theory results and experimental validation showing our solution works well with large graphs analyzed on a parallel cluster with many machines, producing a balanced workload even with highly skewed degree vertices. We consider two types of distributed systems: (1) a parallel DBMS that evaluates SQL queries, and (2) a parallel HPC cluster calling the MPI library (called via Python). Extensive benchmark experiments with large graphs show our SQL solution offers many advantages over MPI and competing graph analytic systems.
更多
查看译文
关键词
Triangle enumeration,Triangle counting,Graph analytics,Adaptive algorithm,Intelligent data distribution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要