UZNER: A Benchmark for Named Entity Recognition in Uzbek.

Aizihaierjiang Yusufu,Liu Jiang, Abidan Ainiwaer,Chong Teng, Aizierguli Yusufu,Fei Li,Donghong Ji

NLPCC (1)(2023)

Cited 0|Views22
No score
Abstract
Named entity recognition (NER) is a key task in natural language processing, and entity recognition can provide necessary semantic information for many downstream tasks. However, the performance of NER is often limited by the richness of language resources. For low-resource languages, NER usually performs poorly due to the lack of sufficient labeled data and pre-trained models. To address this issue, we manually constructed a large-scale, high-quality Uzbek NER corpus of Uzbek, and experimented with various NER methods. We improved state-of-the-art baseline models by introducing additional features and data translations. Data translation enables the model to learn richer syntactic structure and semantic information. Affix features provide knowledge at the morphological level and play an important role in identifying oversimplified low-frequency entity labels. Our data and models will be available to facilitate low-resource NER.
More
Translated text
Key words
named entity recognition,uzbek,benchmark
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined