谷歌浏览器插件
订阅小程序
在清言上使用

Improving Word Representation Quality Trained By Word2vec Via A More Efficient Hierarchical Clustering Method

COOPERATIVE DESIGN, VISUALIZATION, AND ENGINEERING: 15TH INTERNATIONAL CONFERENCE, CDVE 2018(2018)

引用 0|浏览17
暂无评分
摘要
In traditional word2vec methods, hierarchical softmax algorithm uses the whole vocabulary to construct a Huffman tree and it trains each pair of words just in logarithmic time consumption. But due to the lack of consideration about cooperation of each word in the corpus, it will reduce the performance of language model and the trained word vectors. In this paper, we substitute a purely data-driven method for the original Huffman-tree method to rebuild the binary tree. The new construction method utilizes the semantical and syntactical cooperation of words to cluster the words hierarchically. The cooperation of words is reflected in the word vectors which collected from the initial Huffman-tree training procedure. Our methods substantially improve the performances of word vectors in semantical and syntactical tasks.
更多
查看译文
关键词
Language models, Hierarchical neural network, word2vec
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要