SynTL: A synthetic-data-based transfer learning approach for multi-center risk prediction

T. Gu, R. Duan

medRxiv(2022)

引用 0|浏览0
暂无评分
摘要
Objectives: We propose a synthetic-data-based transfer learning approach (SynTL) to incorporate multi-center healthcare data for improving the risk prediction of a target population, accounting for challenges including heterogeneity, data sharing, and privacy constraints. Methods: SynTL combines locally trained risk prediction models from each source population with the target population to adjust for the data heterogeneity through a flexible distance-based transfer learning approach. Heterogeneity-adjusted synthetic data are then generated for source populations where individual-level data are not shareable. The synthetic data are then combined with the target and source data for joint training of the target model. We evaluate SynTL via extensive simulation studies and an application to multi-center data from the electronic Medical Records and Genomics (eMERGE) Network to predict extreme obesity. Results: Simulation studies show that SynTL outperforms methods without adjusting for the data heterogeneity and methods that are trained in a single population over a wide spectrum of settings. SynTL has low communication costs where each participating site only needs to share parameter estimates to the target, requiring only one round of communication. SynTL protects against negative transfer when some source populations are highly different from the target. Using eMERGE data, SynTL achieves an area under the receiver operating characteristic curve (AUC) around 0.79, which outperforms other benchmark methods (0.50 - 0.67). Conclusion: SynTL improves the risk prediction performance of the target population, and is robust to the level of heterogeneity between the target and source populations. It protects patient-level information and is highly communication efficient.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要