Part-of-Speech Tagger for Bodo Language using Deep Learning approach
CoRR(2024)
摘要
Language Processing systems such as Part-of-speech tagging, Named entity
recognition, Machine translation, Speech recognition, and Language modeling
(LM) are well-studied in high-resource languages. Nevertheless, research on
these systems for several low-resource languages, including Bodo, Mizo,
Nagamese, and others, is either yet to commence or is in its nascent stages.
Language model plays a vital role in the downstream tasks of modern NLP.
Extensive studies are carried out on LMs for high-resource languages.
Nevertheless, languages such as Bodo, Rabha, and Mising continue to lack
coverage. In this study, we first present BodoBERT, a language model for the
Bodo language. To the best of our knowledge, this work is the first such effort
to develop a language model for Bodo. Secondly, we present an ensemble DL-based
POS tagging model for Bodo. The POS tagging model is based on combinations of
BiLSTM with CRF and stacked embedding of BodoBERT with BytePairEmbeddings. We
cover several language models in the experiment to see how well they work in
POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A
comparative experiment was also conducted on Assamese POS taggers, considering
that the language is spoken in the same region as Bodo.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要