De-Identification and Policy Implications for Common Data Model: Review for Medical Data Anonymity (Preprint)

semanticscholar(2020)

引用 0|浏览0
暂无评分
摘要
UNSTRUCTURED Common data model (CDM) is a data representation standard that unifies the observational database scheme for each medical institution and allows an analysis using the same tools. Although the analysis for CDM data does not directly examine a medical institution’s original data, it is essential to establish a policy that considers the CDM database operating environment because privacy issues cannot be avoided. The observational medical outcomes partnership common data model (OMOP CDM) defined by Observational Health Data Sciences and Informatics, a nonprofit organization, eliminates most personal information when constructing the database by design principles. When transforming the database of the medical institution to the OMOP CDM structure, the original data “source_value” is maintained to minimize information loss, which may cause the re-identification of the individual. This review presents a de-identification strategy for the original data, which can be considered when operating a CDM database in a public computing environment such as cloud computing. Furthermore, we evaluate the re-identification risk to the CDM database based on the proposed strategy using privacy models such as k-anonymity, l-diversity, and t-closeness. The analysis shows that the CDM database is highly anonymized on average (the highest re-identification record ration is 11.3 %), but every table in the CDM database contains one or more re-identifiable records. It has been confirmed that the risk of re-identification is reduced significantly by applying a de-identification strategy.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要