Optimization Of Row Pattern Matching Over Sequence Data In Spark Sql

DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I(2019)

引用 4|浏览4
暂无评分
摘要
Due to the advance of information and communications technology and sensor technology, a large quantity of sequence data (time series data, log data, etc.) are generated and processed every day. Row pattern matching for the sequence data stored in relational databases was standardized as SQL/RPR in 2016. Today, in addition to relational databases, there are many frameworks for processing a large amount of data in parallel and distributed computing environments. They include MapReduce and Spark. Hive and Spark SQL enable us to code data analysis processes in SQL-like query languages. Row pattern matching is also beneficial in Hive and Spark SQL. However, computational cost of the row pattern matching process is large and it is needed to make this process efficient. In this paper, we propose two optimization methods to realize the reduction of computational cost for row pattern matching process. We focus on Spark and show design and implementation of the proposed methods for Spark SQL. We verify by the experiments that our optimization methods really contribute to the reduction of the processing time of Spark SQL queries including row pattern matching.
更多
查看译文
关键词
Sequence data, Pattern matching, Row Pattern Recognition, MATCH RECOGNIZE, Spark SQL, Optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要