PLS-based gene subset augmentation and tumor-specific gene identification

Computers in Biology and Medicine(2024)

引用 0|浏览1
暂无评分
摘要
In the study of tumor disease pathogenesis, the identification of genes specifically expressed in disease states is pivotal, yet challenges arise from high-dimensional datasets with limited samples. Conventional gene (feature) selection methods often fall short of capturing the complexity of gene-phenotype and gene-gene interactions, necessitating a more robust analysis method. To address these challenges, a gene subset augmentation strategy is proposed in this paper. Our approach introduces diverse perturbation mechanisms to generate distinct gene subsets. The partial least squares-based multiple gene measurement algorithm considers gene-phenotype and gene-gene correlations, identifying differentially expressed genes, including those with weak signals. The constructed gene networks derived from the augmented subsets unveil regulatory patterns, enabling association analysis to explore gene associations comprehensively. Our algorithm excels in identifying small-sized gene subsets with strong discriminative power, surpassing traditional methods that yield a single gene subset. Unlike conventional approaches, our algorithm reveals a spectrum of different gene subsets and their weakly differentially expressed genes. This nuanced perspective aids in unraveling the molecular characteristics and specific expression patterns of tumor genes. The versatility of our approach not only contributes to the advancement of tumor-specific gene identification but also holds promise for addressing challenges in various fields characterized by high-dimensional datasets and limited samples. The Python implementation is available at http://github.com/wenjieyou/PLSGSA.
更多
查看译文
关键词
Gene association analysis,gene (feature) subset augmentation,High-dimensional small sample,tumor-specific genes,weakly differentially expressed genes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要