Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion

Sidney Lyayuga Lisanza, Jake Merle Gershon,Sam Tipps, Lucas Arnoldt, Samuel Hendel, Jeremiah Nelson Sims,Xinting Li,David Baker

biorxiv(2023)

引用 3|浏览5
暂无评分
摘要
Protein denoising diffusion probabilistic models (DDPMs) show great promise in the de novo generation of protein backbones but are limited in their inability to guide generation of proteins with sequence specific attributes and functional properties. To overcome this limitation, we develop ProteinGenerator, a sequence space diffusion model based on RoseTTAfold that simultaneously generates protein sequences and structures. Beginning from random amino acid sequences, our model generates sequence and structure pairs by iterative denoising, guided by any desired sequence and structural protein attributes. To explore the versatility of this approach, we designed proteins enriched for specific amino acids, with internal sequence repeats, with masked bioactive peptides, with state dependent structures, and with key sequence features of specific protein families. ProteinGenerator readily generates sequence-structure pairs satisfying the input conditioning (sequence and/or structural) criteria, and experimental validation showed that the designs were monomeric by size exclusion chromatography (SEC), had the desired secondary structure content by circular dichroism (CD), and were thermostable up to 95°C. By enabling the simultaneous optimization of both sequence and structure, ProteinGenerator allows for the design of functional proteins with specific sequence and structural attributes, and paves the way for protein function optimization by active learning on sequence-activity datasets. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
rosettafold sequence space diffusion,protein sequence,sequence space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要