Hybrid Text Segmentation for Hungarian Clinical Records.


Cited 12|Views5
No score
Nowadays clinical documents are getting widely available to researchers who are aiming to develop resources and tools that may help clinicians in their work. While several attempts exist for English medical text processing, there are only few for other languages. Moreover, word and sentence segmentation tasks are commonly treated as simple engineering issues. In this study, we introduce the difficulties that arise during the segmentation of Hungarian clinical records, and describe a complex method that results in a normalized and segmented text. Our approach is a hybrid combination of a rule-based and an unsupervised statistical solution. The presented system is compared with other algorithms that are available and commonly used. These fail to segment clinical text (all of them reach F-scores below 75%), while our method scores above 90%. This means that only the hybrid tool described in this study can be used for the segmentation of Hungarian clinical texts in practical applications.
Translated text
Key words
text segmentation,clinical records,sentence boundary detection,log-likelihood ratios
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined