Do available protein 3D structures reflect human genetic and functional diversity

bioRxiv(2019)

引用 0|浏览33
暂无评分
摘要
Genomic databases are substantially biased towards European ancestry populations, and this bias contributes to health disparities. Here, we quantify how well 66,971 experimentally characterized human protein 3D structures represent the diversity of protein sequences observed across the 1000 Genomes Project. More than 85% of available structures do not match a sequence observed in at least one individual, and on average structures match the sequence of 74% of individuals. Nearly 23% of human structures do not match observed sequences; however, after masking engineered/known mutations, this decreases to ~4%. African ancestry sequences are modestly, but significantly, less likely to be represented by structures (73.5% vs. 74.0%). These differences are mainly driven by the greater genetic diversity of African populations. We identify thousands of variants unrepresented in available structures that influence protein structure and function. Thus, the use of a single structure as representative of “the wild type” protein will often bias results against many individuals. The diversity of protein sequence and structure must be considered to enable accurate, reproducible, and generalizable conclusions from structural analyses.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要