P-035: AI-based models for the identification of critical genetic biomarkers to distinguish MM from MGUS using the WES data

Clinical Lymphoma, Myeloma & Leukemia(2021)

引用 0|浏览0
暂无评分
摘要
Background Multiple Myeloma (MM) is preceded by the premalignant stage of Monoclonal Gammopathy of Undetermined Significance (MGUS) and therefore, it is important to identify the genetic factors responsible for progression of MGUS to MM. We have built machine learning (ML) models to identify pivotal genetic biomarkers that distinguish MM and MGUS. Methods Tumor normal matched paired Whole Exome Sequencing (WES) data of 1174 patients of MM and 61 patients of MGUS were analyzed. The data were obtained from dbGaP (phs000748; phs000348), AIIMS, Delhi, India, and EGA (EGA1001901). Variants were identified using four variant callers, namely, MuSE, Mutect2, VarScan2, and Somatic-Sniper and; SNVs were annotated using ANNOVAR. Pooled genomic annotations obtained were analyzed to derive significantly mutated genes with ‘dndscv’ tool. Union of top ranked 250 significantly mutated genes from each variant caller yielded 1316 genes. For each gene, variant count and (maximum, mean, median, and standard deviation of) VAF and AD were used as features which were reduced by principal component analysis (PCA) and only top-3 principal components were selected for each gene. Next, 5 ML classifiers (random forest, decision tree, logistic regression, XGBoost, and SVM) were used to distinguish MM from MGUS. Imbalance of data (95% MM and 5% MGUS cases) was handled by the cost-sensitive loss function in the classifiers. Permutation based feature importance was carried out on top two performing models to infer the most significant features that were mapped back to genes to obtain the top ranking genes for MM and MGUS. Results Cost-sensitive SVM outperformed the rest of the models in balanced accuracy, weighted F1-score, Mathews correlation coefficient (MCC), precision, recall and area under curve (AUC) with values 95.5%, 94.82%, 0.8162, 76.49%, 98.33% and 95.5%, respectively. Top ranking genes identified for MM were: HLA-DQB1 IRF1, MUC6, FGFR3, MUC4, HOXA1, ITPR3, HIST1H1E, MUC12, ITGA2, HLA-DQA2, HUWE1, IGLL5, HLA-DRB5, HLA-DQB2, ILK. Top ranking genes identified for MGUS were: MUC3A, HLA-A, HLA-C, IRF4, JAK1, HDAC2, HLA-DQA1, FRG1, HS6ST1, H2AFV, and HLA-DRB1. HLA-DQB1, IRF1, ITPR3, HOXA1, HIST1H1E, HUWE1, IGLL5, HIPK3, HLA-DQA2, HLA-DRB5, and ILK were found significant for MM; and HLA-A, HLA-C, IRF4, JAK1, HDAC2 HLA-DQA1, HS6ST1, H2AFV, and HLA-DRB1 were found significant for MGUS by the top two ML classifiers. All these genes were found significant in the literature for MM and MGUS. Conclusion MGUS and MM share many common features such as genomic biomarkers, structural variants etc. with the difference of having less impact of mutations in MGUS. Thus, it is challenging to distinguish MM from MGUS. Here, we utilized ML classifiers to distinguish MM from MGUS. Our classifiers are able to identify the significant genes that are helpful in MM vs. MGUS classification that can lead to a better understanding of progression from MGUS to MM.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要