谷歌浏览器插件
订阅小程序
在清言上使用

joinTree: A novel join-oriented multivariate operator for spatio-temporal data management in Flink

GEOINFORMATICA(2022)

引用 1|浏览12
暂无评分
摘要
In the era of intelligent Internet, the management and analysis of massive spatio-temporal data is one of the important links to realize intelligent applications and build smart cities, in which the interaction of multi-source data is the basis of realizing spatio-temporal data management and analysis. As an important carrier to achieve the interactive calculation of massive data, Flink provides the advanced Operator Join to facilitate user program development. In a Flink job with multi-source data connection operations, the selection of join sequences and the data communication in the repartition phase are both key factors that affect the efficiency of the job. However, Flink does not provide any optimization mechanism for the two factors, which in turn leads to low job efficiency. If the enumeration method is used to find the optimal join sequence, the result will not be obtained in polynomial time, so the optimization effect cannot be achieved. We investigate the above problems, design and implement a more advanced Operator joinTree that can support multi-source data connection in Flink, and introduce two optimization strategies into the Operator. In summary, the advantages of our work are highlighted as follows: (1) the Operator enables Flink to support multi-source data connection operation, and reduces the amount of calculation and data communication by introducing lightweight optimization strategies to improve job efficiency; (2) with the optimization strategy for join sequence, the total running time can be reduced by 29% and the data communication can be reduced by 34% compared with traditional sequential execution; (3) the optimization strategy for data repartition can further enable the job to bring 35% performance improvement, and in the average case can reduce the data communication by 43%.
更多
查看译文
关键词
Flink,Spatio-temporal data management,Data connection,Join sequence,Data repartition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要