A Feasibility Study of Differentially Private Summary Statistics and Regression Analyses for Administrative Tax Data

arXiv (Cornell University)(2022)

引用 0|浏览1
暂无评分
摘要
Federal administrative tax data are invaluable for research, but because of privacy concerns, access to these data is typically limited to select agencies and a few individuals. An alternative to sharing microlevel data is a validation server, which allows individuals to query statistics without directly accessing the confidential data. This paper studies the feasibility of using differentially private (DP) methods to implement such a server. We provide an extensive study on existing DP methods for releasing tabular statistics, means, quantiles, and regression estimates. We also include new methodological adaptations to existing DP regression methods for using new data types and returning standard error estimates. We evaluate the selected methods based on the accuracy of the output for statistical analyses, using real administrative tax data obtained from the Internal Revenue Service. Our findings show that a validation server is feasible for simple, univariate statistics but struggles to produce accurate regression estimates and confidence intervals. We outline challenges and offer recommendations for future work on validation server frameworks. This is the first comprehensive statistical study of DP regression methodology on a real, complex dataset, that has significant implications for the direction of a growing research field and public policy.
更多
查看译文
关键词
private summary statistics,administrative tax data,regression analyses
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要