Improving Time Complexity and Utility of k-anonymous Microaggregation.

ICETE (Selected Papers)(2021)

引用 0|浏览0
暂无评分
摘要
For research in medicine, economics and social sciences specific data of individuals is needed. Thus it should be publicly available, but this should not offend the privacy of each individual. Microaggregation applied to databases is a standard technique to protect privacy. It clusters similar people in larger groups to achieve so called k-anonymity – every individual is hidden in a cluster of size at least k. Then the data can be made public for all kinds of analysis, whereas other concepts like differential privacy keep the database secret and allow only specific questions about the data to be asked by outsiders. The modification of a database to achieve anonymity should be as small as possible to keep its utility – that means the loss of information should be minimized. In this respect microaggregation typically performs much better than other anonymization techniques like generalization or suppression. However, minimizing the information loss by k-anonymous microaggregation is an NP-hard optimization problem for $$k \ge 3$$ . Not only computing optimal solutions efficiently is unlikely, nontrivial approximations are lacking, too. Therefore, a bunch of heuristics all with at least quadratic time complexity have been developed. This paper improves microaggregation significantly and provides a tradeoff between computational effort and utility. First, we make a detailed analysis and tuning of the maximum distance methodology – the common approach to generate a clustering that provides k-anonymity. We review the methods proposed so far and design a new algorithm $$\texttt{MDAV}^{*}_\gamma $$ that gives better utility on standard benchmarks. A different approach of quadratic time complexity based on Lloyd’s algorithm has been proposed and named ONA, but not completely analysed. This paper fills this gap and improves several steps resulting in a new algorithm $$\texttt{ONA}^{*}$$ with better utility. Mondrian is a another approach for clustering data that can be adopted for microaggregation. It is quite fast, but typically achieves very pure utility. We improve on this and design an almost linear time algorithm that gives acceptable utility, however worse than the quadratic time algorithms. Finally, we combine both techniques, ONA and Mondrian, to construct a new class of parameterized algorithms called $$\texttt{MONA}$$ . They are quite fast with time complexity between almost linear and quadratic, and deliver competitive utility compared to the MDAV approach.
更多
查看译文
关键词
time complexity,k-anonymous
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要