LBFM: Multi-Dimensional Membership Index for Block-Level Data Skipping

2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)(2017)

引用 2|浏览60
暂无评分
摘要
Data skipping has been a promising technique to reduce data access in query engines. By maintaining metadata for each block of tuples, a query may skip a block if the metadata indicates that the block does not contain relevant data. Obviously, the key factor is how to build effective metadata by extracting representative features of blocks. In this paper, we propose a multi-dimensional index, Layered Bloom Filter Matrix (LBFM), which adopts a recursively layered framework, and represents the matrix as an ordered hierarchy of hashmap and bitmap to compress space consumption instead of space-consuming bit matrix. Additionally, LBFM supports dimension combination cutting, and optimal indexing strategy could be generated according to it, thus the space efficiency could be further improved. We demonstrate time complexity of LBFM, and theoretically prove that LBFM has lower space consumption than Bloom Filter Matrix algorithm. We proto- typed our index technique on Spark SQL. Our experiments on TPC-H and a real-world workload show that LBFM gains significant improvement in aspect of query response time over traditional methods.
更多
查看译文
关键词
data skipping,membership index,bloom filter,bitmap
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要