Permutation Index: Exploiting Data Skew For Improved Query Performance

2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020)(2020)

引用 5|浏览55
暂无评分
摘要
Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains, but current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines, resulting in better spatial locality, and better utilization of limited cache resources. We analyze cache behavior, and implement database operators that are efficient in the presence of skew. Experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude.
更多
查看译文
关键词
popular items,cache line,cache resources,cache behavior,synthetic data,in-memory query performance,permutation index,data skew,improved query performance,analytic queries,large-scale data analysis,commercial domains today,scientific domains today,medical domains today,ubiquitous feature,real-world domains,cache resident,popular data item,index structure,data items
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要