Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering

Expert Syst. Appl.(2017)

引用 206|浏览80
暂无评分
摘要
Three meta-heuristic algorithms are adapted to solve the feature selection problem.Feature selection methods are established based on a novel weighting scheme.Dimension reduction technique is proposed to reduce the number of features.K-mean clustering algorithm is used based on the features obtained.The proposed methods outperform the comparative methods. This paper proposes three feature selection algorithms with feature weight scheme and dynamic dimension reduction for the text document clustering problem. Text document clustering is a new trend in text mining; in this process, text documents are separated into several coherent clusters according to carefully selected informative features by using proper evaluation function, which usually depends on term frequency. Informative features in each document are selected using feature selection methods. Genetic algorithm (GA), harmony search (HS) algorithm, and particle swarm optimization (PSO) algorithm are the most successful feature selection methods established using a novel weighting scheme, namely, length feature weight (LFW), which depends on term frequency and appearance of features in other documents. A new dynamic dimension reduction (DDR) method is also provided to reduce the number of features used in clustering and thus improve the performance of the algorithms. Finally, k-mean, which is a popular clustering method, is used to cluster the set of text documents based on the terms (or features) obtained by dynamic reduction. Seven text mining benchmark text datasets of different sizes and complexities are evaluated. Analysis with k-mean shows that particle swarm optimization with length feature weight and dynamic reduction produces the optimal outcomes for almost all datasets tested. This paper provides new alternatives for text mining community to cluster text documents by using cohesive and informative features.
更多
查看译文
关键词
Feature selection,Dynamic dimension reduction,Text document clustering,Weight score,Metaheuristics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要