SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data

BIOINFORMATICS(2023)

Cited 0|Views1
No score
Abstract
Motivation: Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. Results: To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. Availability and implementation: https://github.com/bixuehua/SSLpheno.
More
Translated text
Key words
Genomic Data Integration,Phenotype Ontology,Gene Set Enrichment Analysis,Gene Annotation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined