Persian Word Embedding Evaluation Benchmarks

Mohammad Sadegh Zahedi,Mohammad Hadi Bokaei,Farzaneh Shoeleh,Mohammad Mehdi Yadollahi,Ehsan Doostmohammadi,Mojgan Farhoodi

26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018)（2018）

引用 29|浏览40

暂无评分

摘要

Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the strengths and weakness of five well-known embedding models which are trained on Wikiperlia corpus. The experimental results indicates that FastText(sg) and Word2Vec(chow) outperform other models.

查看译文

关键词

Word Embedding, Evaluation Benchmark, Word2Vec, GloVe, FastText

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要