Machine-learning based prediction for high health care utilizers using a multi-institution diabetes registry: model training and evaluation. (Preprint)

crossref(2024)

引用 0|浏览2
暂无评分
摘要
BACKGROUND The cost of healthcare in many countries is increasing rapidly. There is a growing interest in using machine learning to predict high healthcare utilizers for population health initiatives. Previous studies have focused on individuals who contribute to the highest financial burden. However, this group is small and represents a limited opportunity for long-term cost reduction. OBJECTIVE We developed an ensemble of models that predict future healthcare utilization at various thresholds. METHODS We utilized data from a multi-institutional diabetes database from the year 2019 to develop binary classification models. These models predict healthcare utilization in the subsequent year across six different outcomes: patients having a length of stay of ≥7, ≥14, and ≥30 days, and emergency department (ED) attendance of ≥3, ≥5, and ≥10 visits. To address class imbalance, random and synthetic minority oversampling techniques were employed. The models were then applied to unseen data from 2020 and 2021 to predict healthcare utilization in the following year. A portfolio of performance metrics, with a priority on area under the receiver operating curve (AUC), sensitivity and positive predictive value was used for comparison. RESULTS When trained with random oversampling, four models – logistic regression, multivariate adaptive regression splines, boosted trees, and multilayer perceptron – consistently achieved high AUC (>0.80) and sensitivity (>0.60) across training-validation and test datasets. Correcting for class imbalance proved critical for model performance. Key predictors for all outcomes included age, number of ED visits in the present year, chronic kidney disease stage, inpatient bed days in the present year, and mean HbA1c levels. CONCLUSIONS We successfully developed machine learning models capable of predicting high service level utilization with robust performance. These models can be integrated into wider diabetes-related population health initiatives. CLINICALTRIAL Not Applicable
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要