Effective Topic Distillation With Key Resource Pre-Selection

Yq Liu,M Zhang, Y Liu

INFORMATION RETRIEVAL TECHNOLOGY(2005)

引用 9|浏览0
暂无评分
摘要
Topic distillation aims at finding key resources which are high-quality pages for certain topics. With analysis in non-content features of key resources, a pre-selection method is introduced in topic distillation research. A decision tree is constructed to locate key resource pages using query-independent non-content features including in-degree, document length, URL-type and two new features we found out involving site's self-link structure analysis. Although the result page set contains only about 20% pages of the whole collection, it covers more than 70% of key resources. Furthermore, information retrieval on this page set makes more than 60% improvement with respect to that on all pages. These results were achieved using TREC 2002 web track topic distillation task for training and TREC 2003 corresponding task for testing. It shows an effective way of getting better performance in topic distillation with a dataset significantly smaller in size.(1).
更多
查看译文
关键词
key resource page,key resource pre-selection,non-content feature,key resource,high-quality page,web track topic distillation,topic distillation,corresponding task,effective topic distillation,topic distillation research,certain topic,page set,decision tree,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要