Should Term-Relatedness Be Used in Text Representation?

ICCBR(2013)

引用 1|浏览16
暂无评分
摘要
The variation in natural language vocabulary remains a challenge for text representation as the same idea can be expressed in many different ways. Thus document representations often rely on generalisation to map low-level lexical expressions to higher level concepts in order to capture the inherent semantics of the documents. Term-relatedness measures are often used to generalise document representations by capturing semantic relationships between terms. In this work we conduct a comparative study of common term-relatedness metrics on 43 datasets and discover that generalisation is not always beneficial. Hence, the ability to predict whether or not to generalise the indexing vocabulary of a dataset is important given the computation overhead of generalisation. Accordingly, we present a case-based approach that predicts, given a text dataset, whether or not using generalisation will improve text retrieval performance. The evaluation shows that our approach is able to correctly predict datasets that are likely to benefit from generalisation with over 90% accuracy.
更多
查看译文
关键词
News Story,Vector Space Model,Document Frequency,Inverse Document Frequency,Latent Semantic Indexing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要