Interpretable machine learning model using Extreme Gradient Boosting with Nonnegative Matrix Factorization improves the accuracy of arrhythmic risk prediction in Brugada Syndrome

G. Tse, J. Zhou, S. Li,S. Lee, K. H. C. Li, K. S. K. Leung,W. T. Wong, P. Mililis, D. Asvestas,G. Bazoukis,K. P. Letsas, T. Liu, Q. Zhang

European Heart Journal(2023)

引用 0|浏览9
暂无评分
摘要
Abstract Background Both depolarization and repolarization abnormalities contribute to ventricular arrhythmogenesis in Brugada syndrome (BrS). In this study, we tested the hypothesis that incorporating latent features extracted by various nonnegative matrix factorization (NML) techniques into an interpretable machine learning (IML) prediction model can outperform IML models without latent variables and logistic regression model. Methods This study was based on a published anonymised dataset of BrS patients from the Hong Kong, China. XGBoost was selected as the IML model, with and without incorporating latent features using 11 different NMF techniques: Bayesian nonnegative matrix factorization (BNMF), Iterated Conditional Modes nonnegative matrix factorization (ICM), Fisher Nonnegative Matrix Factorization for learning Local features (LFNMF), Alternating Nonnegative Least Squares Matrix Factorization Using Projected Gradient (bound constrained optimization) method for each subproblem (LSNMF), Non-smooth Nonnegative Matrix Factorization (NSNMF), Probabilistic Nonnegative Matrix Factorization (PMF), Probabilistic Sparse Matrix Factorization (PSMF), Sparse Nonnegative Matrix Factorization (SNMF) based on alternating nonnegativity constrained least squares, Sparse Network-Regularized Multiple Nonnegative Matrix Factorization (SNMNMF), Penalized Matrix Factorization for Constrained Clustering (PMFCC) and Separable Nonnegative Matrix Factorization (SepNMF). Results A total of 548 patients were included (7.3% females, age at diagnosis: 51.0 [38.0-61.0] years old. Of these, 66 suffered from spontaneous ventricular tachyarrhythmias over 84±55 months follow-up. The baseline model using multivariable logistic regression achieved an area under the curve (AUC) of 0.78 [0.72-0.85], which was improved to 0.88 [0.83-0.93] for the IML model without NMF. The AUC was further increased by incorporating additional latent variables extracted using NSNMF (0.95 [0.92-0.98]) and PMFCC (0.94 [0.88-1.00]). Conclusion Incorporation of latent variables by different NMF techniques into an IML prediction model significantly improved the accuracy of risk prediction in BrS.
更多
查看译文
关键词
arrhythmic risk prediction,interpretable machine learning model,extreme gradient boosting,interpretable machine learning,brugada syndrome
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要