Spark Checkpoint Fault Tolerance Strategy Based on Hybrid Storage

Shunjie Pan,Junyang Yu, Han Li, Bohan Li

2023 2nd International Conference on Artificial Intelligence and Intelligent Information Processing (AIIIP)(2023)

引用 0|浏览0
暂无评分
摘要
Spark, a parallel computing framework, performs data processing in-memory to minimize disk access and enhance data processing efficiency. However, Spark does not take into account intermediate data results of jobs. In the event of a cluster node failure, the loss of processing data blocks occurs. Recomputing based on lineage incurs a high cost. To address the challenge of low data recovery efficiency resulting from Spark's lineage-based fault tolerance, this paper introduces the Resilient Check-pointing Strategy based on Hybrid Storage (RCSA). The paper first analyzes the Spark checkpointing process, followed by presenting the design of the hybrid storage module. Finally, it proposes a checkpoint selection algorithm based on multi-branch points. Experimental results indicate that this strategy effectively reduces Spark's backtracking overhead and improves job recovery efficiency.
更多
查看译文
关键词
Spark,Checkpoint,Hybrid Storage,Parallel Computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要