A Class-Imbalanced Study with Feature Extraction via PCA and Convolutional Autoencoder

2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)(2022)

引用 2|浏览0
暂无评分
摘要
It is inherently challenging to train a machine learning algorithm on a class-imbalanced dataset. Under conditions of high dimensionality, this training process can become even more difficult due to the large number of features in the dataset. During preprocessing, data sampling is commonly used to address class imbalance and feature extraction is frequently used to reduce the number of dataset features. In this study, we explore the use of these two preprocessing activities before passing on the data to four ensemble classifiers (Random Forest, CatBoost, LightGBM, and XGBoost). With reference to feature extraction, the Principal Component Analysis (PCA) and Convolutional Autoencoder (CAE) methods are evaluated. With regard to data sampling, the Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE) methods are evaluated. Classification performance is measured with the Area Under the Receiver Operating Characteristic Curve (AUC) metric. Our results indicate that the implementation of the RUS method followed by the CAE method leads to the best classification performance.
更多
查看译文
关键词
feature extraction,data sampling,class imbalance,pca,convolutional autoencoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要