谷歌浏览器插件
订阅小程序
在清言上使用

Accurate pattern-based extraction of complex Gleason score expressions from pathology reports

Joonas Miettinen, Tomas Tanskanen, Henna Degerlund, Aapeli Nevala, Nea Malila, Janne Pitkaniemi

Journal of biomedical informatics(2021)

引用 1|浏览10
暂无评分
摘要
Purpose: The Gleason score is an important grading factor of prostate cancer. Gleason scores can be extracted from pathology report texts using regular expressions, but previously developed programmes have targeted only relatively simple Gleason score expressions. We developed a programme capable of extracting also complex expressions. The programme is relatively easy to adapt to other languages and datasets. Methods: We developed and evaluated our regular expression-based programme using manually processed pathology reports of prostate cancer cases diagnosed in Finland in 2016-2017. Both simple and complex Gleason score expressions were targeted. We measured the performance of our programme using recall, precision, and the F1. The proportion of complex Gleason score expressions was estimated as the complement of the recall when only addition expressions (e.g. "Gleason 3 + 4") were targeted. Results: The detection of values (scores and score components) is based on mandatory keywords before or after the value. The programme favours precision over recall by primarily allowing for lists of optional expressions between keyword-value pairs and only secondarily allowing for arbitrary expressions. The programme is straightforward to adapt to new datasets by modifying the lists of mandatory and optional expressions. The full and addition-only programmes had 92% (95% CI: [90%, 95%]) and 65% ([61%, 70%]) recall and high precision (98% [97%, 99%] and 100% [99%, 100%]), respectively. The estimated proportion of complex Gleason score expressions was 100-65 = 35%. Conclusions: Even complex Gleason score expressions can be extracted with high recall and precision using regular expressions. We recommend implementing automated Gleason score extraction where possible by adapting our validated programme.
更多
查看译文
关键词
Gleason score,Regular expression,Information extraction,Pathology report,Natural language processing,Free-form text
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要