Less is More: How Fewer Results Improve Progressive Join Query Processing

Xin Zhang,Ahmed Eldawy

35TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2023(2023)

引用 0|浏览0
暂无评分
摘要
With the requirements to enable data analytics and exploration interactively and efficiently, progressive data processing, especially progressive join, became essential to data science. Join queries are particularly challenging due to the correlation between input datasets which causes the results to be biased towards some join keys. Existing methods carefully control which parts of the input to process in order to improve the quality of progressive results. If the quality is not satisfactory, they will process more data to improve the result. In this paper, we propose an alternative approach that initially seems counter-intuitive but surprisingly works very well. After query processing, we intentionally report fewer results to the user with the goal of improving the quality. The key idea is that if the output is deviated from the correct distribution, we temporarily hide some results to correct the bias. As we process more data, the hidden results are inserted back until the full dataset is processed. The main challenge is that we do not know the correct output distribution while the progressive query is running. In this work, we formally define the progressive join problem with quality and progressive result rate constraints. We propose an input&output quality-aware progressive join framework (QPJ) that (1) provides input control that decides which parts of the input to process; (2) estimates the final result distribution progressively; (3) automatically controls the quality of the progressive output rate; and (4) combines input&output control to enable quality control of the progressive results. We compare QPJ with existing methods and show QPJ can provide the progressive output that can represent the final answer better than existing methods.
更多
查看译文
关键词
Progressive processing,progressive result quality control,progressive equi-join,progressive spatial join
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要