A Novel Efficient Classification Algorithm for Search Engines.

Vienna(2008)

引用 2|浏览0
暂无评分
摘要
In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documents' categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different web pages and articles and combine these categories with appropriate weighted index. The second phase is the blind categorization phase to build a database that will be categorized according to the result of the first phase. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.
更多
查看译文
关键词
search engines,search engine,sample size,indexation,graph representation,web pages,graph theory,data mining,accuracy,classification algorithms,correlation,information processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要