Phase Annotated Learning for Apache Spark: Workload Recognition and Characterization

2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)(2018)

Cited 4|Views38
No score
Abstract
In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.
More
Translated text
Key words
Workload Detection, Resource Prediction, Mixture of Gaussian, Dynamic Time Warping
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined