Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.

HotCloud'12: Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing(2012)

引用 787|浏览211
暂无评分
摘要
Many important "big data" applications need to process data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional programming API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster. We have prototyped D-Streams in an extension to the Spark cluster computing framework called Spark Streaming, which lets users seamlessly intermix streaming, batch and interactive queries.
更多
查看译文
关键词
fault recovery,efficient fault recovery,long recovery time,new recovery mechanism,parallel recovery,current programming model,high-level functional programming,new programming model,prototyped D-Streams,Spark Streaming,Discretized stream,fault-tolerant model,large cluster,stream processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要