A probabilistic topic model for clinical risk stratification from electronic health records

Journal of Biomedical Informatics（2015）

引用 47|浏览74

暂无评分

摘要

Display Omitted A probabilistic topic modeling framework is proposed for risk stratification.Both patient sub-profiles and risk tiers are generated simultaneously from electronic health records.The patient sub-profiles and associated risk tiers detected by the proposed models are coherent and informative.Evaluation is done using a real clinical dataset from a Chinese hospital. Background and objectiveRisk stratification aims to provide physicians with the accurate assessment of a patients clinical risk such that an individualized prevention or management strategy can be developed and delivered. Existing risk stratification techniques mainly focus on predicting the overall risk of an individual patient in a supervised manner, and, at the cohort level, often offer little insight beyond a flat score-based segmentation from the labeled clinical dataset. To this end, in this paper, we propose a new approach for risk stratification by exploring a large volume of electronic health records (EHRs) in an unsupervised fashion. MethodsAlong this line, this paper proposes a novel probabilistic topic modeling framework called probabilistic risk stratification model (PRSM) based on Latent Dirichlet Allocation (LDA). The proposed PRSM recognizes a patient clinical state as a probabilistic combination of latent sub-profiles, and generates sub-profile-specific risk tiers of patients from their EHRs in a fully unsupervised fashion. The achieved stratification results can be easily recognized as high-, medium- and low-risk, respectively. In addition, we present an extension of PRSM, called weakly supervised PRSM (WS-PRSM) by incorporating minimum prior information into the model, in order to improve the risk stratification accuracy, and to make our models highly portable to risk stratification tasks of various diseases. ResultsWe verify the effectiveness of the proposed approach on a clinical dataset containing 3463 coronary heart disease (CHD) patient instances. Both PRSM and WS-PRSM were compared with two established supervised risk stratification algorithms, i.e., logistic regression and support vector machine, and showed the effectiveness of our models in risk stratification of CHD in terms of the Area Under the receiver operating characteristic Curve (AUC) analysis. As well, in comparison with PRSM, WS-PRSM has over 2% performance gain, on the experimental dataset, demonstrating that incorporating risk scoring knowledge as prior information can improve the performance in risk stratification. ConclusionsExperimental results reveal that our models achieve competitive performance in risk stratification in comparison with existing supervised approaches. In addition, the unsupervised nature of our models makes them highly portable to the risk stratification tasks of various diseases. Moreover, patient sub-profiles and sub-profile-specific risk tiers generated by our models are coherent and informative, and provide significant potential to be explored for the further tasks, such as patient cohort analysis. We hypothesize that the proposed framework can readily meet the demand for risk stratification from a large volume of EHRs in an open-ended fashion.

查看译文

关键词

Risk stratification,Probabilistic topic model,Latent Dirichlet Allocation,Electronic health record,Patient sub-profile,Joint patient sub-profile-risk modeling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要