Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books.

Bingyan Song,Zhenshan Bao, YueZhang Wang,Wenbo Zhang,Chao Sun

international conference natural language processing(2020)

Cited 3|Views9
No score
Abstract
Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.
More
Translated text
Key words
Named entity recognition,Enhanced embedding,BiLSTM-CRF,TCM books,Information extraction
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined