Retrieval of Multimedia Web Documents and Removal of Redundant Information

International Journal on Artificial Intelligence Tools(2011)

引用 8|浏览14
暂无评分
摘要
This paper describes a search engine for multimedia web documents and a methodology for removing (partially or totally) redundant information from multiple documents in an effort to synthesize new documents. In this paper, a typical multimedia document contains free text and images and additionally has associating well-structured data. An SQL-like query language, WebSSQL, is proposed to retrieve this type of documents. The main differences between WebSSQL and other proposed SQL extensions for retrieving web documents are that WebSSQL is similarity-based and supports conditions on images. This paper also deals with the detection and removal of redundant information (text paragraphs and images) from multiple retrieved documents. Documents reporting the same or related events and stories may contain substantial redundant information. The removal of the redundant information and the synthesis of these documents into a single document can not only save a user's time to acquire the information but also storage space to archive the data. The methodology reported here consists of techniques for analyzing text paragraphs and images as well as a set of similarity criteria used to detect redundant paragraphs and images. Examples are provided to illustrate these techniques.
更多
查看译文
关键词
search engine,query language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要