Online streaming feature selection using rough sets

International Journal of Approximate Reasoning(2016)

引用 89|浏览70
暂无评分
摘要
Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way.
更多
查看译文
关键词
Feature selection,Online streaming feature selection,Rough sets theory,Significance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要