Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览17
暂无评分
摘要
Word embedding methods allow to represent words as vectors in a space that is structured using word co-occurrences so that words with close meanings are close in this space. These vectors are then provided as input to automatic systems to solve natural language processing problems. Because interpretability is a necessary condition to trusting such systems, interpretability of embedding spaces, the first link in the chain is an important issue. In this paper, we thus evaluate the interpretability of vectors extracted with two approaches: SPINE, a k-sparse auto-encoder, and SINr, a graph-based method. This evaluation is based on a Word Intrusion Task with human annotators. It is operated using a large French corpus, and is thus, as far as we know, the first large-scale experiment regarding word embedding interpretability on this language. Furthermore, contrary to the approaches adopted in the literature where the evaluation is performed on a small sample of frequent words, we consider a more realistic use-case where most of the vocabulary is kept for the evaluation. This allows to show how difficult this task is, even though SPINE and SINr show some promising results. In particular, SINr results are obtained with a very low amount of computation compared to SPINE, while being similarly interpretable.
更多
查看译文
关键词
word embeddings, interpretability, word intrusion task, human evaluation, french
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要