Streaming balanced graph partitioning algorithms for random graphs
SODA(2014)
摘要
With recent advances in storage technology, it is now possible to store the vast amounts of data generated by cloud computing applications. The sheer size of 'big data' motivates the need for streaming algorithms that can compute approximate solutions without full random access to all of the data. In this paper, we consider the problem of loading a graph onto a distributed cluster with the goal of optimizing later computation. We model this as computing an approximately balanced k-partitioning of a graph in a streaming fashion with only one pass over the data. We give lower bounds on this problem, showing that no algorithm can obtain an o(n) approximation with a random or adversarial stream ordering. We analyze two variants of a randomized greedy algorithm (looking at the distribution of edges from the vertex to be assigned, one prefers the partition that is the arg max and one that assigns the vertex proportional to the edge distribution) on random graphs with embedded balanced k-cuts and are able to theoretically bound the performance of each algorithms - the arg max algorithm is able to recover the embedded k-cut while the proportional variant can not. This matches the experimental results in [30].
更多查看译文
关键词
algorithms,design,graph algorithms,graph labeling,theory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络