Multi-ensemble machine learning framework for omics data integration: A case study using breast cancer samples

Kunal Tembhare, Tina Sharma, Sunitha M. Kasibhatla,Archana Achalere,Rajendra Joshi

Informatics in Medicine Unlocked(2024)

引用 0|浏览0
暂无评分
摘要
Integration of voluminous omics data aids to unravel biological complexities associated with different disease phenotypes. Machine learning (ML) approaches provide insightful techniques for systematic multi-omics data integration. In this study, survival prediction of breast cancer patients was undertaken using omics data of 302 female patients from The Cancer Genome Atlas (TCGA). The data included gene expression, miRNA expression, DNA methylation and copy number variation. Three computational multi-ensemble ML pipelines were tested using Support Vector Machine (SVM), Random Forest (RF) and Partial Least Squares-Discriminant Analysis (PLS-DA) algorithms. To overcome the limitations associated with univariate feature selection criteria, the ML pipelines were built along with latent factors obtained by multivariate dimension reduction method. This facilitated investigation of background genetic networks and identification of potential hub genes. Analysis of the results obtained revealed that SVM with PLS-DA method (integrated with gene expression, DNA methylation, and miRNA expression modalities) was the best-performing model with an Area Under Curve (AUC) of 89% and an accuracy of 83% for survival prediction. This study not only corroborated previously reported breast cancer-specific prognostic biomarkers but also predicted additional potential biomarkers. The work demonstrates the effective use of a multi-ensemble ML model with efficient feature selection methods as a robust protocol for cancer genotype to phenotype correlation.
更多
查看译文
关键词
Multi-omics integration,supervised machine learning,Breast cancer survival,Biomarker prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要