Summarizing Provenance of Aggregate Query Results in Relational Databases

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021)(2023)

引用 4|浏览9
暂无评分
摘要
Data provenance is any information about the origin of a piece of data and the process that led to its creation. Most database provenance work has focused on creating models and semantics to query and generate this provenance information. While comprehensive, provenance information remains large and overwhelming, making it hard for data provenance systems to support data exploration. We present a new approach to provenance exploration that builds on data summarization techniques. We contribute novel summarization schemes for the provenance of aggregation queries and techniques for the fast generation of these summarization schemes. We introduce two types of summaries for aggregate queries. Impact summaries take into account the impact of specific groups of tuples in the provenance of the query on an aggregate result, and comparative summaries allow users to compare the provenance of two aggregate results. We also present algorithms for efficient computation of these summaries, implement optimizations using data sampling and feature selection, and conduct experiments and a user survey to show the feasibility and relevance of our approaches.
更多
查看译文
关键词
provenance, databases, summarization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要