Privacy-Preserving Record Linkage With Spark
2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)(2019)
摘要
Privacy considerations obligate careful and secure processing of personal data. This is especially true when personal data is linked against databases from other organizations. During such endeavours, privacy-preserving record linkage (PPRL) can be utilized to prevent needless exposure of sensitive information to other organizations. With the increase of personal data that is being gathered and analyzed, scalable PPRL capable of handling massive databases is much desired. In this work, we evaluate Apache Spark as an option to scale PPRL. Not only is it valuable to have a scalable PPRL implementation, but one based on the Spark would also be commonly deployable and could take advantage of further development of the ecosystem. Our results show that a PPRL solution based on Spark outperforms alternatives when it conies to handling multiple millions of records; can scale to dozens of nodes; and is on-par with regular record linkage implementations in terms of achieved results.
更多查看译文
关键词
linkage, PPRL, privacy, scalability, Spark
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要