GraphLab: A Distributed Framework for Machine Learning in the Cloud
CoRR(2011)
摘要
Machine Learning (ML) techniques are indispensable in a wide range of fields.
Unfortunately, the exponential increase of dataset sizes are rapidly extending
the runtime of sequential algorithms and threatening to slow future progress in
ML. With the promise of affordable large-scale parallel computing, Cloud
systems offer a viable platform to resolve the computational challenges in ML.
However, designing and implementing efficient, provably correct distributed ML
algorithms is often prohibitively challenging. To enable ML researchers to
easily and efficiently use parallel systems, we introduced the GraphLab
abstraction which is designed to represent the computational patterns in ML
algorithms while permitting efficient parallel and distributed implementations.
In this paper we provide a formal description of the GraphLab parallel
abstraction and present an efficient distributed implementation. We conduct a
comprehensive evaluation of GraphLab on three state-of-the-art ML algorithms
using real large-scale data and a 64 node EC2 cluster of 512 processors. We
find that GraphLab achieves orders of magnitude performance gains over Hadoop
while performing comparably or superior to hand-tuned MPI implementations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要