Provable Privacy with Non-Private Pre-Processing
arxiv(2024)
摘要
When analysing Differentially Private (DP) machine learning pipelines, the
potential privacy cost of data-dependent pre-processing is frequently
overlooked in privacy accounting. In this work, we propose a general framework
to evaluate the additional privacy cost incurred by non-private data-dependent
pre-processing algorithms. Our framework establishes upper bounds on the
overall privacy guarantees by utilising two new technical notions: a variant of
DP termed Smooth DP and the bounded sensitivity of the pre-processing
algorithms. In addition to the generic framework, we provide explicit overall
privacy guarantees for multiple data-dependent pre-processing algorithms, such
as data imputation, quantization, deduplication and PCA, when used in
combination with several DP algorithms. Notably, this framework is also simple
to implement, allowing direct integration into existing DP pipelines.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要