A Pre-trained Model for Chinese Medical Record Punctuation Restoration

Zhipeng Yu,Tongtao Ling,Fangqing Gu, Huangxu Sheng,Yi Liu

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII(2024)

引用 0|浏览0
暂无评分
摘要
In the medical field, text after automatic speech recognition (ASR) is poorly readable due to a lack of punctuation. Even worse, it can lead to the patient misunderstanding the doctor's orders. Therefore, punctuation after ASR to enhance readability is an indispensable step. Most recent work has been fine-tuning downstream tasks on pre-trained models, but the models lack knowledge of relevant domains. Furthermore, most of the research is based on the English language, and there is less work in Chinese and even less in the medical field. Based on this, we thought of adding Chinese medical data to the model pre-training stage and adding the task of punctuation restoration in this work. From this, we proposed the Punctuation Restoration Pre-training Mask Language Model (PRMLM) task in the pre-training stage and used contrastive learning at this stage to enhance the model effect. Then, we propose a Punctuation Prior Knowledge Fine-tuning (PKF) method to play the role of contrast learning better when fine-tuning the downstream punctuation restoration task. In our medical field dataset, we performed a series of comparisons with existing algorithms to verify the proposed algorithm's effectiveness.
更多
查看译文
关键词
Punctuation restoration,Automatic speech recognition,Pre-training mask language model,Supervised contrast learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要