A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks

2018 26th International Conference on Geoinformatics(2018)

引用 1|浏览11
暂无评分
摘要
Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.
更多
查看译文
关键词
urbanization,hbase,auto tuning,performance modeling,performance optimization,ensemble learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要