Boosting the performance of over-sampling algorithms through under-sampling the minority class.

Neurocomputing(2019)

引用 36|浏览13
暂无评分
摘要
Over-sampling algorithms are the most adopted approach to balance class distribution in imbalanced data problems, through random replication or synthesis of new examples in the minority class. Current over-sampling algorithms, however, usually use all available examples in the minority class to synthesise new instances, which may include noisy or outlier data. This work proposes k-INOS, a new algorithm to prevent over-sampling algorithms from being contaminated by noisy examples in the minority class. k-INOS is based on the concept of neighbourhood of influence and works as a wrapper around any over-sampling algorithm. A comprehensive experimentation was conducted to test k-INOS in 50 benchmark data sets, 8 over-sampling methods and 5 classifiers, with performance measured according to 7 metrics and Wilcoxon signed-ranks test. Results showed, particularly for weak classifiers (but not only), k-INOS significantly improved the performance of over-sampling algorithms in most performance metrics. Further investigations also allowed to identify conditions where k-INOS is likely to increase performance, according to features and rates measured from the data sets. The extensive experimentation framework evidenced k-INOS as an efficient algorithm to be applied prior to over-sampling methods.
更多
查看译文
关键词
Imbalanced learning,Over-sampling,Under-sampling,Noisy data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要