Online data fusion

Hosted Content(2011)

引用 85|浏览33
暂无评分
摘要
AbstractThe Web contains a significant volume of structured data in various domains, but a lot of data are dirty and erroneous, and they can be propagated through copying. While data integration techniques allow querying structured data on the Web, they take the union of the answers retrieved from different sources and can thus return conflicting information. Data fusion techniques, on the other hand, aim to find the true values, but are designed for offline data aggregation and can take a long time.This paper proposes Solaris, the first online data fusion system. It starts with returning answers from the first probed source, and refreshes the answers as it probes more sources and applies fusion techniques on the retrieved data. For each returned answer, it shows the likelihood that the answer is correct, and stops retrieving data for it after gaining enough confidence that data from the unprocessed sources are unlikely to change the answer. We address key problems in building such a system and show empirically that the system can start returning correct answers quickly and terminate fast without sacrificing the quality of the answers.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要