A Speculative Parallel Execution Model for Apache Spark

2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS)（2018）

Cited 1|Views8

No score

Abstract

Apache Spark is a popular distributed computing platform at present for its excellent capacity in big data computing, and the large-scale parallel execution in Apache Spark relies on the divide-and-conquer strategy. However, the irregular algorithms cannot be divided into small-scale problems and processed parallelly on Apache Spark, because the complex dependencies inside the input data stop the input from being divided. To remedy this, based on Thread-Level Speculation technique, this paper proposes a speculative parallel execution model for Apache Spark. With the proposed model, the complex dependencies inside the input data is conquered by a speculative strategy, thus the input could be divided into small chunks. Accordingly, the irregular algorithms could also be partitioned into a series of small-scale tasks and be executed in parallel. After parallel execution, the proposed model raises an evaluation method to eliminate incorrect speculations and ensure the correctness of the final output. At last, to verify the practicability and scalability of the proposed model, the intrusion prevention system (IPS) is implemented on the proposed model, and the achievement is encouraging. Experiments show that through the proposed model, the execution time for big data intrusion prevention could be markedly reduced. All in all, by adopting our novel model, the efficiency of the irregular algorithms on Apache Spark can be enhanced significantly.

Translated text

Key words

Cluster computing,IP networks,Data models,Computational modeling,Task analysis,Parallel processing,Instruction sets

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined