A domain adaptive pre-training language model for sentence classification of Chinese electronic medical record.

2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(2023)

引用 0|浏览4
暂无评分
摘要
Accurately extracting and classifying Chinese electronic medical record (EMR), which contain huge amounts of valuable medical information, have promising practical application and medical value in the health care of China. While the pivotal issue has gathered escalating attention, the bulk of current research is directed towards operations conducted at the document or entity level within medical records. Only a restricted body of work addresses these concerns at the sentence level, a critical aspect for downstream tasks like medical information retrieval, diagnosis normalization, and question answering. In this paper, we present a domain adaptive pre-training language model named CEMR-LM for sentence classification of Chinese EMRs. CEMR-LM acquires Chinese medical domain knowledge through the utilization of copious unlabeled clinical corpus for pre-training the language model. This is fortified by combining fine-tuning strategy and a dual-channel mechanism, which collectively contribute to the model’s heightened performance. Experiments on the benchmark dataset and real world hospital dataset both demonstrate that CEMR-LM is superior to the state-of-the-art methods. Furthermore, CEMR-LM possesses the capability to elucidate indicative elements within medical records by visualizing of the attention weights embedded within the model. The implemented code and experimental datasets are available online at https://github.com/BioMedBigDataCenter/CEMR-LM.
更多
查看译文
关键词
Domain adaptive pre-training,Language model,Sentence classification,Chinese electronic medical record
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要