Correlational networking guides the discovery of cryptic proteases for lanthipeptide maturation

semanticscholar(2021)

Cited 0|Views8
No score
Abstract
Proteases required for lanthipeptide maturation are not encoded in many of their respective biosynthetic gene clusters. These cryptic proteases hinder the study and application of lanthipeptides as promising drug candidates. Here, we establish a global correlation network bridging the gap between lanthipeptide precursors and cryptic proteases. Applying our analysis to 161,954 bacterial genomes, we establish 6,041 correlations between precursors and cryptic proteases, with 91 prioritized. We use network predictions and co-expression analysis to reveal a previously missing protease for the maturation of class I lanthipeptide paenilan. We further discover widely distributed bacterial M16B metallopeptidases of previously unclear biological function as a new family of lanthipeptide proteases. We show the involvement of a pair of bifunctional M16B proteases in the production of new class III lanthipeptides with high substrate specificity. Together, these results demonstrate the strength of our correlational networking approach to the discovery of cryptic lanthipeptide proteases. Introduction The increasing resistance of bacteria to conventional antimicrobial therapy has necessitated the urgent requirement for the discovery of novel antibiotics, including lantibiotics, a type of lanthipeptides, that have exhibited antimicrobial activity against a range of multi-drug-resistant (MDR) bacteria. Lanthipeptides also possess anti-fungal, anti-HIV, and antinociceptive activities, attracting increasingly more research interests for discovery and application of lanthipeptides. As one of the most common ribosomally synthesized and post-translationally modified peptides (RiPPs), the biosynthesis of all four different classes of lanthipeptides involves a crucial proteasemediated cleavage between the leader and core peptides for final maturation. However, only two types of proteases have been relatively well studied. The first is the subtilisin-like serine protease LanP, employed by class I and II lanthipeptides. The second is the papain-like cysteine protease domain of the LanT transporter protein, involved exclusively in the biosynthesis of class II lanthipeptides. Due to the absence of protease-encoding genes in most characterized class III and IV lanthipeptide biosynthetic gene clusters (BGCs), the maturation of these two classes is barely understood, with FlaP and AplP being recently reported as potential proteases for class III lanthipeptides. With exponentially increasing microbial genome sequences available, increasingly more lanthipeptide BGCs are being identified as lacking BGC-associated genes to encode proteases. Thus, a missing link between these BGCs and their cryptic proteases hinders the discovery, biosynthetic study, heterologous production, and bioengineering of these potentially bioactive lanthipeptides. We hypothesize that lanthipeptide BGCs without any colocalized protease-encoding genes may rely on proteases encoded elsewhere in the genome. Thus, in this study, we developed a genome mining workflow and used correlation analysis complemented by co-expression analysis to establish the first global correlation network between lanthipeptide precursor peptides and proteases from 161,954 bacterial genome sequences. This correlation network provides guidance for targeted discovery of cryptic lanthipeptide proteases encoded by genes outside of the BGCs. As a proof of principle, we selected two representative correlations from the network for study, leading to a simultaneous discovery of new lanthipeptides and responsible cryptic proteases. Particularly, a family of bacterial M16B metallopeptidases with previously unclear biological functions was identified as being responsible for maturation of several new class III lanthipeptides. Results Establishment of a global correlation network between lanthipeptide precursor peptides and proteases. We established a global precursor-protease network for all four classes of lanthipeptides using 161,954 bacterial genomes obtained from the NCBI RefSeq database. Analyzing this large number of genomes with antiSMASH 5.0, we identified 21,225 putative lanthipeptide BGCs widely distributed across all bacterial taxa. These BGCs harbor 29,489 highly diverse precursors (Supplementary Fig. 1). We observed that approximately one third of lanthipeptide BGCs do not harbor any protease genes, especially class III system (Supplementary Fig. 2). BGCs without any colocalized protease-encoding genes, may rely on proteases encoded outside of those BGCs for leader peptide removal. In this scenario, we hypothesized that the specificity between precursors and corresponding proteases still exists, at least to some extent, based on two observations: (i) only certain homologs of a protease in a genome have proteolytic activity against a specific precursor, and (ii) the proteolytic activity is affected by core peptide modifications. Thus, based on the specificity between precursors and proteases, we performed a global correlation analysis, aiming to identify potential lanthipeptide maturation proteases, regardless of their gene location inside or outside of a BGC. Due to the large number of proteases contained in the genomes performing general functions, it was not rational or practical to directly correlate lanthipeptide precursors to all the proteases in the genomes. Thus, we started with pathway-specific proteases encoded by lanthipeptide BGCs. We hypothesized that pathway-specific proteases likely evolved from general proteases encoded elsewhere in the genome and, at the large scale of the dataset, pathway-specific proteases could collectively represent all the functional domains of cryptic proteases encoded outside of the BGCs. Indeed, by searching proteases from 21,225 putative lanthipeptide BGCs, we generated a library of 44,260 prospective lanthipeptide proteases, representing 120 unique Pfam domains (Supplementary Table 1). In contrast, previously characterized proteases associated with lanthipeptides only encompass 6 Pfam domains. We used these 120 Pfam domains to search for proteases from the full set of 161,954 bacterial genomes, resulting in 23,777,967 putative lanthipeptide related proteases. Grouping these proteases based on their sequence similarity using MMseqs2 led to the formation of 288,416 groups of proteases. Among these, 10,263 groups each containing 100 or more members were selected for downstream analysis. We also applied the same clustering approach to 29,489 lanthipeptide precursors, forming 4,527 groups. Among these, 263 groups each containing ten or more precursors were selected for downstream analysis (Fig. 1a). We then sought to identify links between the selected groups of proteases and precursors. To reduce the effect of phylogenetic relatedness, we performed Spearman rank correlation analysis individually for each genus, leading to identification of 6,041 significant correlations (ρ>0.3, pAdj<1E-5, Fig. 1b) between precursors and proteases. These significant correlations ranged from 6 phyla and 114 genera, suggesting that widely distributed proteases, even encoded by genes outside of the BGCs, may function against specific lanthipeptide precursors. We next focused on class III lanthipeptides due to their elusive maturation process. We identified 1,833 putative precursors encoding class III lanthipeptides. These precursor peptides were classified into 217 groups based on sequence similarity, including 18 groups that each contained at least 10 members. We selected the correlations identified in at least ten genomes (I>=10) for further analysis, leading to prioritization of 91 significant correlations (ρ>0.3, pAdj<1E-5, I>=10) between 8 groups of precursors and 87 groups of proteases. These significant correlations were distributed in two phyla, including 80 correlations in Actinobacteria and 11 correlations in Firmicutes. The core information of these prioritized significant correlations is summarized in Fig. 1c, Supplementary Fig. 3, and Supplementary Table 2, with some representative discoveries described below. Among the 91 significant correlations, metallopeptidases appeared in many significant correlations, suggesting that the metallopeptidase superfamily may play an important role in the maturation of class III lanthipeptides. In contrast, families of serine protease and cysteine protease have been reported for the maturation of class I and II lanthipeptides. Remarkably, among the 91 significant correlations representing 864 precursors, 758 precursors (88%) were strongly correlated with only one or two groups of proteases. For example, the precursors of groups Pre_5, Pre_24, and Pre_115 were only correlated to two, two, and one group(s) of proteases (ρ>0.3, pAdj<1E-5, I>=10; Fig. 1c and Supplementary Table 2) at the genus level of Streptomyces, Paenibacillus, and Lentzea, respectively. None of these 888 candidate proteases had been characterized and only 19 of them are encoded by genes within the corresponding BGCs. Thus, this result exhibited the potential of our correlation network in identifying cryptic proteases for maturation of class III lanthipeptides. On the other hand, five groups of precursors, Pre_77, Pre_117, Pre_134, Pre_181, and Pre_228, were each correlated to multiple groups of proteases, forming five multiple-correlation clusters at the genus level of Amycolatopsis, Streptomyces, Alkalihalobacillus, Lactobacillus, and Rhodoccocus, respectively (Fig. 1c, Supplementary Fig. 3). Together, these five groups only represent the remaining 12% of the precursors from the 91 significant correlations. Some groups of proteases, e.g. Prot_1169, Prot_2308, and Prot_9513, were observed to share similar functional domains, which may partially account for their simultaneous correlations with the same precursor group. At first glance, this multiple-correlation pattern presented a challenge to identify a responsible protease. However, the correlation strength betwee
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined