Utemeljevanje sloWNeta na korpusnih podatkih

Slovenščina 2.0: empirical, applied and interdisciplinary research(2013)

引用 0|浏览2
暂无评分
摘要
Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.
更多
查看译文
关键词
leksikalna semantika,wordnet,semantična podobnost,semantične relacije
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要