Generalization Ability of CNN-Based Morpheme Segmentation

2023 Ivannikov Ispras Open Conference (ISPRAS)(2023)

引用 0|浏览0
暂无评分
摘要
Determining the morphemic structure of a word is a problem that is particularly relevant in teaching the Russian language. Automatic evaluation of this structure is complicated by the lack of agreement among linguists in some complex cases. At the same time, several papers have been published in recent years, whose authors use various machine learning methods to solve this problem in applications. The authors of [1] propose an architecture based on convolutional neural networks for Russian lemmas. The proposed algorithm has shown quality sufficient for solving various applied problems. At the same time, generalization ability of this algorithm in case of unmet morphemes remains unclear. In this paper, we discovered that quality of the algorithm drops by 16-18% in terms of word accuracy when testing on words with roots absent from the training sample. Taking into account the significant robustness of the algorithm to a uniform reduction in the training sample, we can conclude that training dataset for studied model can be small but should be as diverse as possible.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要