Chrome Extension
WeChat Mini Program
Use on ChatGLM

Bayesian nonparametric disclosure risk assessment

ELECTRONIC JOURNAL OF STATISTICS(2021)

Cited 0|Views3
No score
Abstract
Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number tau(1) of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of tau(1) with desirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underestimates of tau(1) in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonparametric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman-Yor process prior, and it leads to a novel estimator of tau(1) with all the desirable features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a "discount" parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.
More
Translated text
Key words
Bayesian nonparametrics, data confidentiality, Dirichlet process prior, disclosure risk assessment, empirical Bayes, Pitman-Yor process prior
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined