Pronounce differently, mean differently: A multi-tagging-scheme learning method for Chinese NER integrated with lexicon and phonetic features

Information Processing & Management(2022)

引用 5|浏览93
暂无评分
摘要
Named Entity Recognition (NER) aims to automatically extract specific entities from the unstructured text. Compared with performing NER in English, Chinese NER is more challenging in recognizing entity boundaries because there are no explicit delimiters between Chinese characters. However, most previous researches focused on the semantic information of the Chinese language on the character level but ignored the importance of the phonetic characteristics. To address these issues, we integrated phonetic features of Chinese characters with the lexicon information to help disambiguate the entity boundary recognition by fully exploring the potential of Chinese as a pictophonetic language. In addition, a novel multi-tagging-scheme learning method was proposed, based on the multi-task learning paradigm, to alleviate the data sparsity and error propagation problems that occurred in the previous tagging schemes, by separately annotating the segmentation information of entities and their corresponding entity types. Extensive experiments performed on four Chinese NER benchmark datasets: OntoNotes4.0, MSRA, Resume, and Weibo, show that our proposed method consistently outperforms the existing state-of-the-art baseline models. The ablation experiments further demonstrated that the introduction of the phonetic feature and the multi-tagging-scheme has a significant positive effect on the improvement of the Chinese NER task.
更多
查看译文
关键词
Named entity recognition,Phonetic feature,Lexicon feature,Multiple tagging schemes,Natural language processing,Information extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要