A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 26|浏览59
暂无评分
摘要
We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy data set is equivalent to learning the structure of a model over binary random variables, where each random variable corresponds to a functional of the data set attributes. We build upon this observation to introduce FDX a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that FDX can recover true functional dependencies across a diverse array of real-world and synthetic data sets, even in the presence of noisy or missing data. We find that FDX scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F1 improvement of 2x against state-of-the-art FD discovery methods.
更多
查看译文
关键词
Functional Dependencies, Structure Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要