Aided Selection of Sampling Methods for Imbalanced Data Classification

CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD)(2021)

引用 3|浏览20
暂无评分
摘要
Building an effective classifier for imbalanced data is a challenging task as most of classifier work on the assumption of balanced data. Therefore, several sampling methods have been devised to bridge this gap by re-sampling the imbalanced datasets. Although sampling methods are in abundance, there is no single method that is best suitable for all kinds of datasets and applications. Building classifiers for all the sampling methods and comparing the results using appropriate performance metrics is computationally inefficient. In this work, we propose a framework to find a relation between datasets and sampling methods via a set of meta-features that characterizes the distribution of data. Also, we take into account the effect of probability threshold on the choice of sampling methods. The main objective of this work is to develop an approach that aids the selection of one or more sampling methods together with a probability threshold to be used for building a suitable classifier for a given dataset. It is based on mapping functions learned between classifier performance and datasets after re-sampling. In this work, extensive experiments are performed to validate the framework using synthetic as well as KEEL benchmark datasets.
更多
查看译文
关键词
Classification,Class Imbalance,Sampling Methods,Meta-features,Classifier Performance Measures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要