Correcting Wide-Range of Arabic Spelling Mistakes Using Machine Learning and Transformers.

Raghda Diab Hasan,Gheith A. Abandah

ICIT(2023)

引用 0|浏览7
暂无评分
摘要
Different languages are subject to several types of spelling mistakes. In this paper, we use a deep neural network machine learning model called transformer to correct wide range of Arabic soft spelling mistakes including: lexical and semantic errors (due to the sound of the letter), keyboard errors (due to the character position on the keyboard), common typing errors (like deleting space, preposition errors, and third pronoun errors), and some random typing errors when the user replaces a letter with some other letter. Our approach has two stages: first, we use huge Wiki-40B dataset of Arabic texts which is error free. We inject this dataset with synthetic targeted errors, then we perform training and testing on this dataset. In the second stage, we use the training weights of the first stage to perform training and testing on a dataset that we have collected for this purpose, which contains real Arabic spelling mistakes. We call this dataset REALMS, which stands for REal Arabic Language Mistakes in Spelling. The proposed model can correct 99.12% of the artificial errors that were injected into the Wiki-40B dataset. Additionally, it achieves a character error rate of 1.14% and accuracy of 98.85% on the REALMS dataset.
更多
查看译文
关键词
Arabic,Spelling mistakes,Transformer,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要