Abstract 11005: The Impact of Time Censoring on Machine Learning Models Which Identify Patients With Undiagnosed Cardiac Amyloidosis

Greg Lee, Sam Fielden, Brendan Carry,Alvaro Ulloa Cerna, Linyuan Jing,Arun Nemani, Daniel Rocha, K. Flick, Jeffrey Ruhl, Jagadish Venkataraman,Noah Zimmerman,Ruijun Chen,Brandon K. Fornwalt, Christopher M Haggerty

Circulation(2022)

引用 0|浏览0
暂无评分
摘要
Introduction: Cardiac amyloidosis (CA) is a common cause of progressive heart failure. New therapies can improve outcomes but most CA patients remain undiagnosed and untreated. Machine learning models deployed on electronic health record (EHR) data may be able to find patients with undiagnosed CA. To date, most models have focused on identification of undiagnosed amyloid from uncensored data modalities (Fig 1). Hypothesis: We hypothesized that lack of post-diagnosis censoring when training CA models leads to poor performance in predicting patients with undiagnosed CA whereas training with appropriate time censoring improves performance. Methods: We used 41 EHR features (demographics, labs, electrocardiogram/echocardiography measurements, vitals) to train a boosted decision tree model with and without time censoring. This was applied to 112 patients with confirmed CA and 22,400 controls matched on age, sex, encounter frequency and timespan of EHR. We also compared our findings to a web-based CA algorithm that was publicly available in 2020. Results: The EHR algorithm had modestly higher performance on at-risk, time-censored patients when trained with and without time censoring (area under the receiver operating characteristic curve (AUROC) 0.84±0.09 vs 0.79±0.07). Testing on temporally uncensored data showed higher performance (AUROC: 0.91±0.05) which may be unrepresentative of deployment scenarios where post-diagnostic features are unavailable for model use. The publicly available algorithm demonstrated a similar trend when tested on uncensored data (AUROC: 0.67±0.03) as compared to an appropriately censored feature set (AUROC: 0.54±0.04). Conclusions: EHR algorithms can be trained to find patients with high risk of undiagnosed cardiac amyloidosis. These models should be evaluated on temporally censored data so that post-diagnostic features do not artificially inflate performance estimates and negatively impact real-world deployment.
更多
查看译文
关键词
time censoring,amyloidosis,machine learning models,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要