OfCourse: web content discovery, classification and information extraction for online course materials

Proceedings of the 18th ACM conference on Information and knowledge management(2009)

引用 9|浏览0
暂无评分
摘要
In this paper we present OfCourse, a vertical search engine for online course materials. These materials have the following characteristics: they are scattered very sparsely in the university Web sites; and are generated by the teachers with totally different HMTL templates and layouts. These characteristics impose some challenges for Web Classification (to identify the course materials) and Web Information Extraction (to extract course metadata, such as course title, time and ID) from the identified course homepages. Here, we describe our proposed method to tackle these challenges, and the features of this system. OfCourse, containing over 60,000 courses from the top 50 universities in the US, is currently available for public access at http://fusion.hpl.hp.com/OfCourse/.
更多
查看译文
关键词
web information extraction,course metadata,online course material,different hmtl template,course homepages,web content discovery,university web site,course title,course material,information extraction,following characteristic,web classification,search engine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要