Blind signal decomposition of various word embeddings based on join and individual variance explained

Yikai Wang, Weijian Li

arxiv(2020)

引用 0|浏览1
暂无评分
摘要
In recent years, natural language processing (NLP) has become one of the most important areas with various applications in human's life. As the most fundamental task, the field of word embedding still requires more attention and research. Currently, existing works about word embedding are focusing on proposing novel embedding algorithms and dimension reduction techniques on well-trained word embeddings. In this paper, we propose to use a novel joint signal separation method - JIVE to jointly decompose various trained word embeddings into joint and individual components. Through this decomposition framework, we can easily investigate the similarity and difference among different word embeddings. We conducted extensive empirical study on word2vec, FastText and GLoVE trained on different corpus and with different dimensions. We compared the performance of different decomposed components based on sentiment analysis on Twitter and Stanford sentiment treebank. We found that by mapping different word embeddings into the joint component, sentiment performance can be greatly improved for the original word embeddings with lower performance. Moreover, we found that by concatenating different components together, the same model can achieve better performance. These findings provide great insights into the word embeddings and our work offer a new of generating word embeddings by fusing.
更多
查看译文
关键词
various word embeddings,blind signal decomposition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要