Identifying Factors Important for Conservation at Sites of Synonymous Variations

Abhirami Ramasubramanian, Uma Sunderam,Rajgopal Srinivasan

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Synonymous mutations can have a deleterious effect leading to disease, even though they are not protein altering. Variations at genomic sites leading to synonymous variants are frequently highly conserved across species. Several prediction methods have been developed to assess the impact of synonymous mutations and are highly dependent on having validated sets of both deleterious and benign synonymous mutations. However, validated data available for deleterious synonymous mutations is sparse unlike for missense mutations. Rather than develop a model for predicting pathogenicity of synonymous variants, we seek to understand the relative importance of various factors that lead to conservation at sites of synonymous variants. Our study built machine learning models using various features on a large set of reported and generated synonymous variants ([Zeng Z et al, 2019][1]) to predict conservation (Genomic Evolutionary Rate Profiling – Rejected Substitution (GERP RS) base scores and Phylogenetic p-values for 100 vertebrates (PP100)) at genomic sites. We used the extreme gradient boosting classifier to classify sites as high, medium and low conservation at different cutoffs. Our experiments report an AUC between 0.74-0.79 and the sensitivity was significant. Of the features we explored, a few alternate allele independent properties were repeatedly flagged as having high impact. These findings provide information for predictors to further improve models for synonymous variant impact. ### Competing Interest Statement The authors have declared no competing interest. [1]: #ref-1
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要