Balancing Data Utility versus Information Loss in Data-Privacy Protection using k-Anonymity

2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)(2020)

引用 2|浏览0
暂无评分
摘要
Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of \pmbk on different number \pmbc of columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of \pmbk and \pmbc are obtained, and thus, the optimal \pmbk and \pmbc can be used for k-anonymity algorithm to anonymize optimal number of columns of the dataset.
更多
查看译文
关键词
Data Utility,Information loss,Privacy Protection,K-Anonymity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要