Performance of a Breast Cancer Detection AI Algorithm Using the Personal Performance in Mammographic Screening Scheme

RADIOLOGY(2023)

引用 0|浏览5
暂无评分
摘要
Background: The Personal Performance in Mammographic Screening (PERFORMS) scheme is used to assess reader performance. Whether this scheme can assess the performance of artificial intelligence (AI) algorithms is unknown. Purpose: To compare the performance of human readers and a commercially available AI algorithm interpreting PERFORMS test sets. Materials and Methods: In this retrospective study, two PERFORMS test sets, each consisting of 60 challenging cases, were evaluated by human readers between May 2018 and March 2021 and were evaluated by an AI algorithm in 2022. AI considered each breast separately, assigning a suspicion of malignancy score to features detected. Performance was assessed using the highest score per breast. Performance metrics, including sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), were calculated for AI and humans. The study was powered to detect a medium-sized effect (odds ratio, 3.5 or 0.29) for sensitivity. Results: A total of 552 human readers interpreted both PERFORMS test sets, consisting of 161 normal breasts, 70 malignant breasts, and nine benign breasts. No difference was observed at the breast level between the AUC for AI and the AUC for human readers (0.93% and 0.88%, respectively; P = .15). When using the developer's suggested recall score threshold, no difference was observed for AI versus human reader sensitivity (84% and 90%, respectively; P = .34), but the specificity of AI was higher (89%) than that of the human readers (76%, P = .003). However, it was not possible to demonstrate equivalence due to the size of the test sets. When using recall thresholds to match mean human reader performance (90% sensitivity, 76% specificity), AI showed no differences in performance, with a sensitivity of 91% (P = .73) and a specificity of 77% (P = .85). Conclusion: Diagnostic performance of AI was comparable with that of the average human reader when evaluating cases from two enriched test sets from the PERFORMS scheme. (c) RSNA, 2023
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要