Investigating hybrid approaches for Arabic text diacritization with recurrent neural networks

2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)(2017)

引用 11|浏览8
暂无评分
摘要
Deep neural networks are efficiently used today to solve many complex problems including the automatic diacritization of Arabic text. This paper investigates a hybrid approach for this problem based on a recurrent neural network (RNN). We use the MADAMIRA full morphological and syntactical analyzer to assist the RNN. Only the high confidence diacritics and word segmentation output of this analyzer is fed to the RNN that generates the fully diacritized output. On the LDC ATB3 benchmark, the suggested hybrid approach performs better than the statistical approach. It achieves diacritic and word error rates of 2.39 and 8.40%, respectively, which are 34 and 26% improvements, respectively, over the best previous hybrid results. We implemented the RNN using parallel software and hardware. We use the CURRENNT library to run the RNN on a GPU with 16 streaming multiprocessors. Compared with the previous RNN-based system, our solution is 326 times faster to train and takes an average 0.003 seconds to diacritize a word. This speed makes training on very large data sets feasible to build larger and more accurate deep neural networks.
更多
查看译文
关键词
automatic diacritization,Arabic text,CURRENNT,hybrid approach,recurrent neural network,sequence transcription
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要