Improving the prediction accuracy of protein abundance in Escherichia coli using mRNA accessibility.

NUCLEIC ACIDS RESEARCH(2020)

引用 17|浏览24
暂无评分
摘要
RNA secondary structure around translation initiation sites strongly affects the abundance of expressed proteins in Escherichia coli. However, detailed secondary structural features governing protein abundance remain elusive. Recent advances in high-throughput DNA synthesis and experimental systems enable us to obtain large amounts of data. Here, we evaluated six types of structural features using two large-scale datasets. We found that accessibility, which is the probability that a given region around the start codon has no base-paired nucleotides, showed the highest correlation with protein abundance in both datasets. Accessibility showed a significantly higher correlation (Spearman's rho = 0.709) than the widely used minimum free energy (0.554) in one of the datasets. Interestingly, accessibility showed the highest correlation only when it was calculated by a log-linear model, indicating that the RNA structural model and how to utilize it are important. Furthermore, by combining the accessibility and activity of the Shine-Dalgarno sequence, we devised a method for predicting protein abundance more accurately than existing methods. We inferred that the log-linear model has a broader probabilistic distribution than the widely used Turner energy model, which contributed to more accurate quantification of ribosome accessibility to translation initiation sites.
更多
查看译文
关键词
mrna accessibility,protein abundance,escherichia coli,prediction accuracy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要