A PARADIGM FOR TESTING THE ACCURACY OF DIGITAL SLEEP STAGING SYSTEMS

SLEEP(2022)

引用 0|浏览4
暂无评分
摘要
Abstract Introduction Despite evidence showing that agreement between human and some automatic staging systems is generally comparable to agreement between human scorers, automated scoring is rarely used in clinical practice, even though it offers time savings and consistency. We propose a paradigm for testing digital systems that reveals their true accuracy vs. highly experienced academic scorers. As an example of a digital method to be tested, we used Michele Sleep Scoring (abbreviated:Digital). Methods 70 PSGs were scored by 6 experienced technologists from 3 academic centers. Staging results were compared to digital staging results using an epoch-by-epoch approach. For each PSG we carried out 6 cycles of comparisons. Each cycle consisted of two steps, one comparing one scorer (tested scorer) with the scoring of the five remaining scorers (judges), and one comparing Digital as the tested scorer with the same 5 judges. Error 1 was assessed when all judges disagreed with the tested scorer but there was disagreement between the judges. Error 2 was assigned when all judges disagreed with the tested scorer but agreed unanimously on the stage. For each PSG the number of epochs with types 1 and 2 errors was counted for each scorer (n=6 scorers) and for Digital. Results of all 70 PSGs were pooled, and percent of types 1 and 2 errors is reported for all scorers and Digital. Results 70 PSGs (females aged 51.1 ± 4.2 years) were evaluated. Average times in different sleep stages (manual scoring) were 43±18, 244±47, 30±21, and 81±25 minutes for stages N1, N2, N3 and REM, respectively. TST was 398±52 minutes, and sleep efficiency was 84±8%. There was a total of 65,053 epochs scored by each scorer and Digital. The average percent of type 1 errors made by scorers for all epochs was 6.4% (0-33.2) vs. 7.8% (1.68-26.6) made by Digital. The average percent of type 2 errors made by scorers for all epochs was 3.9% (0-28.6) vs. 4.3% (0-17.3) made by Digital. Conclusion This study provides an objective way of testing the accuracy of automated scoring systems and supports evidence that the accuracy of Michele Sleep Scoring is comparable to manual scoring. Support (If Any) None
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要