OCTess: AN OPTICAL CHARACTER RECOGNITION ALGORITHM FOR AUTOMATED DATA EXTRACTION OF SPECTRAL DOMAIN OPTICAL COHERENCE TOMOGRAPHY REPORTS.

Retina (Philadelphia, Pa.)(2024)

引用 0|浏览1
暂无评分
摘要
PURPOSE:Manual extraction of spectral domain optical coherence tomography (SD-OCT) reports is time and resource intensive. This study aimed to develop an optical character recognition (OCR) algorithm for automated data extraction from Cirrus SD-OCT macular cube reports. METHODS:SD-OCT monocular macular cube reports (n = 675) were randomly selected from a single-center database of patients from 2020 to 2023. Image processing and bounding box operations were performed, and Tesseract (an OCR library) was used to develop the algorithm, OCTess. The algorithm was validated using a separate test data set. RESULTS:The long short-term memory deep learning version of Tesseract achieved the best performance. After reverifying all discrepancies between human and algorithmic data extractions, OCTess achieved accuracies of 100.00% and 99.98% in the training (n = 125) and testing (n = 550) datasets, while the human error rate was 1.11% (98.89% accuracy) and 0.49% (99.51% accuracy) in each, respectively. OCTess extracted data in 3.1 seconds, compared with 94.3 seconds per report for human evaluators. CONCLUSION:We developed an OCR and machine learning algorithm that extracted SD-OCT data with near-perfect accuracy, outperforming humans in both accuracy and efficiency. This algorithm can be used for efficient construction of large-scale SD-OCT data sets for researchers and clinicians.
更多
查看译文
关键词
optical character recognition algorithm,automated data extraction,coherence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要