Decoding sequence-level information to predict membrane protein expression

bioRxiv(2017)

引用 1|浏览16
暂无评分
摘要
The expression and purification of integral membrane proteins remains a major bottleneck in the characterization of these important proteins. Expression levels are currently unpredictable, which renders the pursuit of these targets challenging and highly inefficient. Evidence demonstrates that small changes in the nucleotide or amino-acid sequence can dramatically affect membrane protein biogenesis; yet these observations have not resulted in generalizable approaches to improve expression. In this study, we develop a data-driven statistical model that predicts membrane protein expression in E. coli directly from sequence. The model, trained on experimental data, combines a set of sequence-derived variables resulting in a score that predicts the likelihood of expression. We test the model against various independent datasets from the literature that contain a variety of scales and experimental outcomes demonstrating that the model significantly enriches expressed proteins. The model is then used to score expression for membrane proteomes and protein families highlighting areas where the model excels. Surprisingly, analysis of the underlying features reveals an importance in nucleotide sequence-derived parameters for expression. This computational model, as illustrated here, can immediately be used to identify favorable targets for characterization.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要