A Study on the Performance of Recurrent Neural Network based Models in Maithili Part of Speech Tagging

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING(2023)

引用 1|浏览9
暂无评分
摘要
This article presents our effort in developing a Maithili Part of Speech (POS) tagger. Substantial effort has been devoted to developing POS taggers in several Indian languages, including Hindi, Bengali, Tamil, Telugu, Kannada, Punjabi, and Marathi; but Maithili did not achieve much attention from the research community. Maithili is one of the official languages of India, with around 50 million native speakers. So, we worked on developing a POS tagger in Maithili. For the development, we use a manually annotated in-house Maithili corpus containing 56,126 tokens. The tagset contains 27 tags. We train a conditional random fields (CRF) classifier to prepare a baseline system that achieves an accuracy of 82.67%. Then, we employ several recurrent neural networks (RNN)-based models, including Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), LSTM with a CRF layer (LSTM-CRF), and GRU with a CRF layer (GRU-CRF) and perform a comparative study. We also study the effect of both word embedding and character embedding in the task. The highest accuracy of the system is 91.53%.
更多
查看译文
关键词
Part of speech tagging,Maithili language,neuralmodel forNLP,Recurrent,neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要