Time-series Anonymization of Tabular Health Data using Generative Adversarial Network.

IJCNN(2023)

Cited 0|Views10
No score
Abstract
Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques.
More
Translated text
Key words
generative adversarial networks,anonymization,synthetic data,data perturbation,EHR
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined