Learning Distributed Representations Of Uyghur Words And Morphemes

Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data: 14th China National Conference, CCL 2015 and Third International Symposium, NLP-NABD 2015, Guangzhou, China, November 13-14, 2015, Proceedings(2015)

引用 3|浏览63
暂无评分
摘要
While distributed representations have proven to be very successful in a variety of NLP tasks, learning distributed representations for agglutinative languages such as Uyghur still faces a major challenge: most words are composed of many morphemes and occur only once on the training data. To address the data sparsity problem, we propose an approach to learn distributed representations of Uyghur words and morphemes from unlabeled data. The central idea is to treat morphemes rather than words as the basic unit of representation learning. We annotate a Uyghur word similarity dataset and show that our approach achieves significant improvements over CBOW, a state-of-the-art model for computing vector representations of words.
更多
查看译文
关键词
Distributed representations,Uyghur,Word,Morpheme
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要