Computational Microarray Gene Selection Model Using Metaheuristic Optimization Algorithm for Imbalanced Microarrays Based on Bagging and Boosting Techniques

Model and Data Engineering(2022)

引用 0|浏览14
暂无评分
摘要
Genomic microarray databases encompass complex high dimensional gene expression samples. Imbalanced microarray datasets refer to uneven distribution of genomic samples among different contributed classes which can negatively affect the classification performance. Therefore, gene selection from imbalanced microarray dataset can give rise to misleading, and inconsistent nominated genes that would alter the classification performance. Such unsatisfactory classification performance is due to the skewed distribution of the samples across the microarrays toward the majority class. In this paper, we propose a modified version of Emperor Penguin Optimization (EPO) algorithm combined with Random Forest (RF) of Bagging and Boosting Classification named by EPO-RF to select the most informative genes based on classification accuracy using imbalanced microarray datasets. The modified version of EPO was built to be based on decision trees that takes in consideration the criterion of tree splitting weights to handle the imbalanced microarray datasets. Average gene expression binary values are used as a preliminary step for exploring disease trajectories with the aid of metaheuristic optimization feature selection algorithms. Results show that the proposed model revealed its superiority compared to well-known established metaheuristic optimization algorithms, e.g., Harris Hawks Optimization (HHO), Grey Wolf Optimization (GWO), Salp Swarm Optimization (SSO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GA’s) using several pediatric sepsis microarray datasets for patients who admitted to the Intensive Care Unit (ICU) for the first 24 h.
更多
查看译文
关键词
Gene selection, Imbalanced microarray, Metaheuristic, Oversampling, Random Forest
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要