Chrome Extension
WeChat Mini Program
Use on ChatGLM

Semi-supervised feature selection using maximum mutual information and minimum correlated feature set retrieved by augmented learning

crossref(2024)

Cited 0|Views2
No score
Abstract
Feature selection is a critical pre-processing step in machine learning. For supervised problems, class labels are used to identify important features. However, the tagging of the data is labor intensive and hence costly. Consequently, there is an abundance of unlabeled data and limited labeled data. Hence, semi-supervised learning is very pertinent. The problem of feature selection is equally relevant for semi-supervised learning. In this research work, a fresh semi-supervised method of feature selection is proposed. In the first step, gradient boosting classifier is used for labeling the unlabeled portion of the data and augment the training set. Repeated sampling is done from the unlabeled portion, to generate multiple augmented training sets. Top-k features are selected based on mutual information from each augmented training set. While selecting the features, it is ensured that the features are not redundant using a correlation coefficient. A voting-based approach is used to combine these multiple feature sets. The proposed method is compared with a) Supervised Feature Selection on the full dataset (Benchmark) and b) Supervised Feature Selection on the labeled portion. On comparing these three methods across 18 datasets, it was found that semisupervised feature selection outperforms the supervised model based on F1 scores by 2.78% and 2.63% in two different configurations. Also, the model outperforms the benchmark by 0.36% and 1.12%.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined