Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment

PLOS COMPUTATIONAL BIOLOGY(2023)

引用 26|浏览11
暂无评分
摘要
We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information. Proteins are essential molecules in all living organisms, with their function largely determined by their sequence. Modifying a protein's sequence to achieve a desired function remains a challenging endeavor, requiring careful consideration of factors such as the stability of the structure and interactions with molecular partners. In this study, we devised a protein design method that combines insights from experimental data, natural variation in protein sequences, and physics-based predictions. This approach provides a reliable and interpretable means of altering a protein's sequence while maintaining its functionality. We applied our technique to a domain of the Cas9 protein, a key component in the CRISPR gene editing system. Our results demonstrate the possibility of generating functional protein domains with over 20% of their sequence modified. These findings underscore how the integration of diverse sources of information in a unified design process enhances the quality of engineered proteins. This advancement holds promise for creating valuable protein variants for applications in drug development and various industries.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要