Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023(2023)

引用 0|浏览48
暂无评分
摘要
Large multilingual models have inspired a new class of word alignment methods, which work well for the model ' s pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods ? We contribute gold-standard alignments for BribriSpanish, Guarani-Spanish, Quechua-Spanish, and Shipibo-Konibo-Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other.
更多
查看译文
关键词
pretrained models,languages,alignments,low-resource
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要