RW-Tree: A Learned Workload-aware Framework for R-tree Construction

2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022)(2022)

引用 6|浏览44
暂无评分
摘要
R-tree is a popular index which supports efficient queries on multi-dimensional data. The performance of R-tree mostly depends on how the tree structure is built if new data instances are inserted, which has been studied for years. Existing works can be categorized into two groups. One is the bulk-loading approaches that insert data instances in batch, but they cannot support real-time insertion. Hence, our focus is on the other one that inserts each data instance individually, and thus fresh data can be instantly queried. However, existing methods do not consider the workload information, which leads to limited potential optimization opportunity. Therefore, it is important to study workload-aware R-tree construction for efficient multidimensional data access. There are several challenges. First, how to represent the query workload is a challenge. Second, given a workload, it is challenging to accurately measure the benefit of a data insertion choice. Third, both range queries and kNN queries should be considered in the workload. To address these challenges, we propose a novel framework that leverages a learning-based method to solve the workloadaware R-tree construction problem. First, by extracting the query workload features, we learn a distribution for the workload using the space partition. Second, considering the distribution, we design a cost model to describe the benefits ( i.e., query execution time) of different insertion choices and select the best one. Third, we convert the kNN queries to range search ones, so as to support the workload including both types of queries. Experimental results show that on OpenStreetMap real datasets, compared with baselines, we improve the query efficiency by 1.17x.
更多
查看译文
关键词
RW-tree,learned workload-aware framework,efficient queries,tree structure,data instance,bulk-loading approaches,real-time insertion,fresh data,workload information,efficient multidimensional data access,data insertion choice,range queries,kNN queries,learning-based method,workload-aware R-tree construction problem,query workload features,query execution time,different insertion choices,query efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要