Chrome Extension
WeChat Mini Program
Use on ChatGLM

Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence

European Radiology(2023)

Cited 3|Views18
No score
Abstract
Objectives To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres. Methods A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. Results The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630–0.665), 35.3% (CI, 24.7–47.8), and 94.2% (CI, 93.3–95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true-positive lesions. The readers’ AUROCs ranged from 0.534–0.676 without AI and 0.571–0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96–10.27 s longer with AI assistance (all p values < 0.05). Conclusions The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest. Key Points • This study shows the limited applicability of commercial AI software for detecting abnormalities in CXRs in a health screening population. • When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well. • Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.
More
Translated text
Key words
Artificial intelligence, Thoracic radiography, Software, Multicentre study, Validation study
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined