Validation of an artificial intelligence model for 12-lead ECG interpretation

A. Demolder,R. Herman, B. Vavrik, M. Martonak, V. Boza, M. Herman, T. Palus, V. Kresnakova, J. Bahyl, A. Iring,O. Nelis,D. Fabbricatore,L. Perl,R. Hatala,J. Bartunek

European Heart Journal(2023)

引用 0|浏览1
暂无评分
摘要
Abstract Background The electrocardiogram (ECG) is one of the most accessible and comprehensive diagnostic tools to assess cardiac abnormalities. However, automated ECG interpretation remains inferior to physician interpretation in terms of accuracy and reliability. Purpose This study evaluated the accuracy of an AI-powered ECG model in providing a precise diagnosis of 12-lead ECGs and compared its diagnostic performance to primary care physicians and cardiologists through extensive benchmarking. Methods A deep neural network (DNN) was trained on standard 12-lead ECGs to detect 38 diagnoses (grouped into 6 categories: rhythm, conduction abnormalities, chamber enlargement, infarction, ectopy, and axis), denoting the most common types of electrocardiographic abnormalities. Performance of AI-powered ECG diagnosis was evaluated on an independent test set annotated by consensus of two expert cardiologists. Benchmarking was performed against three individual primary care physicians and six individual cardiologists who independently annotated the same ECG test set. The key metrics used to compare performances were positive predictive value (PPV), negative predictive value (NPV), Sensitivity, Specificity, and F1 score. Results A total of 931,344 standard 12-lead ECGs from 172,750 patients were used to train a DNN. The independent test set had 11,932 annotated ECG labels. The model attained an overall mean F1 score of 0.921, sensitivity 0.910 (0.889–0.931), specificity 0.968 (0.954–0.981), PPV 0.939 (0.919–0.958), and NPV 0.965 (0.951–0.979) [Figure 1]. In all 6 diagnostic categories, the DNN achieved higher mean F1 scores than the mean cardiologist and primary care physician (Rhythm 0.951 vs. 0.892 vs. 0.734; Conduction abnormalities 0.883 vs. 0.824 vs. 0.693; Chamber enlargement 0.970 vs. 0.761 vs. 0.562; Infarction 0.918 vs. 0.853 vs. 0.781; Ectopy 0.966 vs. 0.951 vs. 0.897; Axis 0.909 vs. 0.644 vs. 0.528, respectively). The ability of the DNN to identify atrial fibrillation achieved nearly perfect performance (PPV of 0.989 and NPV of 0.990). Diagnostic performance surpassed primary care physicians and was non-inferior to cardiologists based on the F1 scores for all individual diagnoses. Conclusions Our results demonstrate the AI-powered ECG model’s ability to accurately identify electrocardiographic abnormalities from the 12-lead ECG, showcasing its utility as clinical tool for healthcare professionals.
更多
查看译文
关键词
artificial intelligence model,artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要