Mistaken identities lead to missed opportunities: Testing for mean differences in partially matched data
arxiv(2023)
摘要
It is increasingly common to collect pre-post data with pseudonyms or
self-constructed identifiers. On survey responses from sensitive populations,
identifiers may be made optional to encourage higher response rates. The
ability to match responses between pre- and post-intervention phases for every
participant may be impossible in such applications, leaving practitioners with
a choice between the paired t-test on the matched samples and the two-sample
t-test on all samples for evaluating mean differences. We demonstrate the
inadequacies with both approaches, as the former test requires discarding
unmatched data, while the latter test ignores correlation and assumes
independence. In cases with a subset of matched samples, an opportunity to
achieve limited inference about the correlation exists. We propose a novel
technique for such `partially matched' data, which we refer to as the
Quantile-based t-test for correlated samples, to assess mean differences using
a conservative estimate of the correlation between responses based on the
matched subset. Critically, our approach does not discard unmatched samples,
nor does it assume independence. Our results demonstrate that the proposed
method yields nominal Type I error probability while affording more power than
existing approaches. Practitioners can readily adopt our approach with basic
statistical programming software.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要