Extract knowledge from semi-structured websites for search task simplification.

CIKM '11: International Conference on Information and Knowledge Management Glasgow Scotland, UK October, 2011(2011)

引用 2|浏览71
暂无评分
摘要
Simplifying the key tasks of search engine users by directly retrieving to them structured knowledge according to their queries is attracting much attention from both industry and academia. A bottleneck of this challenging problem is how to extract the structured knowledge from the noisy and complex Web scale websites automatically. In this paper, we propose an unsupervised automatic wrapper induction algorithm, named as Scalable Knowledge Extractor from webSites (SKES). SKES induces the wrapper in a divide and conquer mode, i.e., it divides the general wrapper into several sub-wrappers to learn from the data independently. Moreover, through employing techniques such as tag path representation of Web pages, SKES is verified to be efficient and noise-tolerant by the experimental results. Furthermore, based on our automatically extracted knowledge, we also built a prototype to serve structured knowledge to end users for simplifying their key search tasks. Very positive feedbacks were received on the prototype.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要