Using HTML Formatting to Aid in Natural Language Processing on the World Wide Web
msra(1998)
摘要
Because of its magnitude and the fact that it is not computer understandable, the WorldWide Web has become a prime candidate for automatic natural language tasks. This thesisargues that there is information in the layout of a web page, and that by looking at theHTML formatting in addition to the text on a page, one can improve performance in taskssuch as learning to classify segments of documents. A rich representation for web pages, theHTML Struct Tree, is described. A parsing algorithm...
更多查看译文
关键词
world wide web,web pages,natural language processing,natural language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要