Evaluation of a machine-learning model based on laboratory parameters for the prediction of acute leukaemia subtypes: a multicentre model development and validation study in France

Vincent Alcazer, Gregoire Le Meur, Marie Roccon, Sabrina Barriere,Baptiste Le Calvez,Bouchra Badaoui,Agathe Spaeth,Olivier Kosmider,Nicolas Freynet,Marion Eveillard, Carolyne Croizier,Simon Chevalier,Pierre Sujobert

LANCET DIGITAL HEALTH（2024）

引用 0|浏览9

暂无评分

摘要

Background Acute leukaemias are life-threatening haematological cancers characterised by the infiltration of transformed immature haematopoietic cells in the blood and bone marrow. Prompt and accurate diagnosis of the three main acute leukaemia subtypes (ie acute lymphocytic leukaemia [ALL], acute myeloid leukaemia [AML], and acute promyelocytic leukaemia [APL]) is of utmost importance to guide initial treatment and prevent early mortality but requires cytological expertise that is not always available. We aimed to benchmark different machine-learning strategies using a custom variable selection algorithm to propose an extreme gradient boosting model to predict leukaemia subtypes on the basis of routine laboratory parameters. Methods This multicentre model development and validation study was conducted with data from six independent French university hospital databases. Patients aged 18 years or older diagnosed with AML, APL, or ALL in any one of these six hospital databases between March 1, 2012, and Dec 31, 2021, were recruited. 22 routine parameters were collected at the time of initial disease evaluation; variables with more than 25% of missing values in two datasets were not used for model training, leading to the final inclusion of 19 parameters. The performances of the final model were evaluated on internal testing and external validation sets with area under the receiver operating characteristic curves (AUCs), and clinically relevant cutoffs were chosen to guide clinical decision making. The final tool, Artificial Intelligence Prediction of Acute Leukemia (AI-PAL), was developed from this model. Findings 1410 patients diagnosed with AML, APL, or ALL were included. Data quality control showed few missing values for each cohort, with the exception of uric acid and lactate dehydrogenase for the cohort from Hopital Cochin. 679 patients from Hopital Lyon Sud and Centre Hospitalier Universitaire de Clermont-Ferrand were split into the training (n=477) and internal testing (n=202) sets. 731 patients from the four other cohorts were used for external validation. Overall AUCs across all validation cohorts were 0 center dot 97 (95% CI 0 center dot 95-0 center dot 99) for APL, 0 center dot 90 (0 center dot 83-0 center dot 97) for ALL, and 0 center dot 89 (0 center dot 82-0 center dot 95) for AML. Cutoffs were then established on the overall cohort of 1410 patients to guide clinical decisions. Confident cutoffs showed two (0 center dot 14%) wrong predictions for ALL, four (0 center dot 28%) wrong predictions for APL, and three (0 center dot 21%) wrong predictions for AML. Use of the overall cutoff greatly reduced the number of missing predictions; diagnosis was proposed for 1375 (97 center dot 5%) of 1410 patients for each category, with only a slight increase in wrong predictions. The final model evaluation across both the internal testing and external validation sets showed accuracy of 99 center dot 5% for ALL diagnosis, 98 center dot 8% for AML diagnosis, and 99 center dot 7% for APL diagnosis in the confident model and accuracy of 87 center dot 9% for ALL diagnosis, 86 center dot 3% for AML diagnosis, and 96 center dot 1% for APL diagnosis in the overall model. Interpretation AI-PAL allowed for accurate diagnosis of the three main acute leukaemia subtypes. Based on ten simple laboratory parameters, its broad availability could help guide initial therapies in a context where cytological expertise is lacking, such as in low-income countries. Copyright (c) 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC 4.0 license.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要