Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing.

NSDI'12: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation(2012)

引用 6268|浏览832
暂无评分
摘要
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse-grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.
更多
查看译文
关键词
memory abstraction,shared memory,interactive data mining tool,iterative algorithm,iterative job,shared state,coarse-grained transformation,current computing framework,fault tolerance,fault-tolerant manner,fault-tolerant abstraction,in-memory cluster computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要