Ancient Chinese Word Segmentation and Part-of-Speech Tagging Using Distant Supervision

arxiv(2023)

引用 0|浏览23
暂无评分
摘要
Ancient Chinese word segmentation (WSG) and part-of-speech tagging (POS) are important to study ancient Chinese, but the amount of ancient Chinese WSG and POS tagging data is still rare. In this paper, we propose a novel augmentation method of ancient Chinese WSG and POS tagging data using distant supervision over parallel corpus. However, there are still mislabeled and unlabeled ancient Chinese words inevitably in distant supervision. To address this problem, we take advantage of the memorization effects of deep neural networks and a small amount of annotated data to get a model with much knowledge and a little noise, and then we use this model to relabel the ancient Chinese sentences in parallel corpus. Experiments show that the model trained over the relabeled data outperforms the model trained over the data generated from distant supervision and the annotated data. Our code is available at https://github.com/farlit/ACDS.
更多
查看译文
关键词
Ancient Chinese,word segmentation and POS tagging,distant supervision,relabeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要