Regression with linked datasets subject to linkage error

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS(2022)

引用 8|浏览5
暂无评分
摘要
Data are often collected from multiple heterogeneous sources and are combined subsequently. In combing data, record linkage is an essential task for linking records in datasets that refer to the same entity. Record linkage is generally not error-free; there is a possibility that records belonging to different entities are linked or that records belonging to the same entity are missed. It is not advisable to simply ignore such errors because they can lead to data contamination and introduce bias in sample selection or estimation, which, in return, can lead to misleading statistical results and conclusions. For a long while, this problem was not properly recognized, but in recent years a growing number of researchers have developed methodology for dealing with linkage errors in regression analysis with linked datasets. The main goal of this overview is to give an account of those developments, with an emphasis on recent approaches and their connection to the so-called "Broken Sample" problem. We also provide a short empirical study that illustrates the efficacy of corrective methods in different scenarios. This article is categorized under: Statistical Models > Model Selection Statistical and Graphical Methods of Data Analysis > Robust Methods Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
更多
查看译文
关键词
Bayesian analysis, data integration, linkage error, mixture models, record linkage, regression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要