谷歌浏览器插件
订阅小程序
在清言上使用

Creating Data in Icelandic for Text Normalization.

Helga Svala Sigurðardóttir,Anna Björk Nikulásdóttir,Jón Guðnason

NoDaLiDa(2021)

引用 0|浏览6
暂无评分
摘要
There is no natural way to acquire normalized data so we try to create good enough data to attempt more advanced methods for text normalization. We manually annotated the first normalized corpus in Icelandic, 40,000 sentences, and developed Regína, a rule-based system for text normalization. Regína gets 90.83% accuracy compared to the manually annotated corpus on non-standard words. Regína showed a significant improvement in accuracy when compared to an older normalization system for Icelandic. The normalized corpus and Regína will be released as open source.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要