High-Performance Probabilistic Record Linkage Via Multi-Dimensional Homomorphisms

SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING(2019)

引用 2|浏览0
暂无评分
摘要
Probabilistic Record Linkage (PRL) identifies data records referring to the same real-world entity, e.g., in a database. PRL is increasingly used in epidemiology centers, intelligence agencies, and universities. However, PRL is a time-consuming task, which limits its applicability for large data sets in real-world applications.We address the problem of accelerating PRL by parallelizing it for modern high-performance architectures, such as multi-core CPU and many-core GPU. Our approach relies on the formalism of Multi-Dimensional Homomorphisms (MDHs) - a class of functions with a generic parallel implementation in OpenCL. The schema allows for automatic optimization for a particular target hardware architecture by exploiting the auto-tuning approach. Our experiments show that we achieve significantly better performance on both CPU and GPU - speedups of up to 80 times - as compared to the parallel implementation of PRL that is currently used by EKR - the largest cancer registry in Europa.
更多
查看译文
关键词
probabilistic record linkage,high-performance high-performance,multi-dimensional
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要