Performance Analysis of Different Word Embedding Models on Bangla Language

Zakia Sultana Ritu,Nafisa Nowshin,Md Mahadi Hasan Nahid,Sabir Ismail

2018 International Conference on Bangla Speech and Language Processing (ICBSLP)（2018）

引用 11|浏览1

暂无评分

摘要

In this paper we discuss the performance of three-word embedding methods on Bangla corpus. Word embedding is a big part of natural language processing related research works. Many research works have focused on finding appropriate methods of word clustering process. Previously N-gram models were used for this purpose but now with the improvement of deep learning methods, dynamic word clustering models are preferred because they reduce processing time and improve memory efficiency. In this paper we discuss the performance of three word embedding models namely, word2vec in Tensorflow, word2vec from Gensim package and FastText model. We use same dataset on all the model and analyze the outcomes. These three models are applied on a Bangla dataset containing 5,21,391 unique words to produce the clusters and we evaluate their performance in terms of accuracy and efficiency.

查看译文

关键词

Natural Language Processing(NLP),machine learning,deep learning,word cluster,word embedding,Bangla word clustering,word2vec,fasttext,skip-gram,CBOW,GloVe

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要