Hypernym Information and Sentiment Bias Probing in Distributed Data Representation.

ICMLC(2023)

引用 0|浏览0
暂无评分
摘要
Neural word embedding vectors have been exhaustively investigated by probing tasks, whether they contain semantic and syn- tactic information. Perhaps the most popular task is a test on gender relation “king - man + woman ≈ queen”, other probings include tests on singular/plural relation (apple∼apples), analogy (good:better∼rough:X), purity of the clusters of word embeddings (categorization). Here, we propose two novel probing tasks to evaluate the compositionality of word embeddings. First, we propose probing tests to evaluate whether word embeddings contain structural information about their hypernyms; more specifically, whether it is possible to predict the hypernym (e.g. color) of a word (e.g. red) using word embedding. Our experimental results show that a simple logistic regression can predict well the hypernyms of words; therefore, making the word embeddings linearly separable with respect to their hypernyms. Second, we also provide a methodology to investigate whether word embeddings encode undesired sentiment. For instance, we show that the word embedding of American or Euro- pean are surrounded by more positive words than the embeddings around the word “Chinese”. Thus, we conclude that word embed- dings also can capture sentiment, leading to undesired machine bias in downstream applications.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要