Chrome Extension
WeChat Mini Program
Use on ChatGLM

iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases.

Molecular plant(2016)

Cited 630|Views41
No score
Abstract
Transcription factors (TFs) are proteins that regulate the expression of target genes by binding to specific cis-elements in promoter regions. Transcriptional regulators (TRs) also regulate the expression of target genes; however, they operate indirectly via interaction with the basal transcription apparatus (e.g., TFs), or by altering the accessibility of DNA to TFs via chromatin remodeling. Another type of regulatory proteins, protein kinases (PKs), function in signal transduction pathways and alter the activity of target proteins by phosphorylating them. These three important classes of regulatory proteins have been associated with numerous aspects of plant growth and development (Gapper et al., 2014Gapper N.E. Giovannoni J.J. Watkins C.B. Understanding development and ripening of fruit crops in an ‘omics’ era.Hortic. Res. 2014; 1: 14034Crossref PubMed Scopus (52) Google Scholar, Xu and Zhang, 2015Xu J. Zhang S. Mitogen-activated protein kinase cascades in signaling plant growth and development.Trends Plant Sci. 2015; 20: 56-64Abstract Full Text Full Text PDF PubMed Scopus (344) Google Scholar), and response to biotic and abiotic stimuli (Zhang et al., 2013Zhang Y. Lubberstedt T. Xu M. The genetic and molecular basis of plant resistance to pathogens.J. Genet. Genomics. 2013; 40: 23-35Crossref PubMed Scopus (70) Google Scholar, Mickelbart et al., 2015Mickelbart M.V. Hasegawa P.M. Bailey-Serres J. Genetic mechanisms of abiotic stress tolerance that translate to crop yield stability.Nat. Rev. Genet. 2015; 16: 237-251Crossref PubMed Scopus (576) Google Scholar). Effective and accurate identification and classification of these genes is important for understanding their evolution, biological functions, and regulatory networks. Currently, more than 100 plant genomes have been sequenced and regulatory proteins have been systematically identified from several of these plant genomes. Databases presenting these regulatory proteins, especially TFs, have been developed, such as PlnTFDB (Pérez-Rodríguez et al., 2010Pérez-Rodríguez P. Riaño-Pachón D.M. Corrêa L.G.G. Rensing S.A. Kersten B. Mueller-Roeber B. PlnTFDB: updated content and new features of the plant transcription factor database.Nucleic Acids Res. 2010; 38: D822-D827Crossref PubMed Scopus (542) Google Scholar) and PlantTFDB (Jin et al., 2013Jin J. Zhang H. Kong L. Gao G. Luo J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors.Nucleic Acids Res. 2013; 42: D1182-D1187Crossref PubMed Scopus (669) Google Scholar). However, annotations of TF/TR families and the associated classification rules have been inconsistent among different studies. For example, the PlantTFDB does not include TRs that are presented in PlnTFDB. As another example, the forbidden domain (a domain that the specific TF families should not contain) of the C2H2 family is annotated as an RNase_T domain in PlantTFDB but as a PHD domain in PlnTFDB. Presently, while the collection of genome sequences is rapidly expanding, cataloged and annotated TFs/TRs vary across different databases due to inconsistent identification and characterization criteria with serious consequences for genome-scale and targeted analyses. Furthermore, in contrast to many studies focusing on specific families of plant regulators, computational tools for identification and classification of these regulatory proteins on a genome scale are very limited. In this study, we systemically compared TF/TR classification rules used in different databases, and derived a set of consensus rules based on the available literature for accurate plant TF/TR identification and classification. For plant PKs, we directly used the HMM profiles developed by Lehti-Shiu and Shiu, 2012Lehti-Shiu M.D. Shiu S.-H. Diversity, classification and function of the plant protein kinase superfamily.Philos. Trans. R. Soc. Lond. B Biol. Sci. 2012; 367: 2619-2639Crossref PubMed Scopus (206) Google Scholar to provide a comprehensive classification system. These consensus rules for TF/TR classification and HMM profiles for PK classification were implemented in iTAK (http://bioinfo.bti.cornell.edu/tool/itak), a computational program that provides consistency and uniformity on the identification and classification of plant TFs, TRs, and PKs. To construct consensus rules for TF/TR prediction and classification, we compared Pfam domain assignment rules between PlnTFDB and PlantTFDB. The families in PlantTFcat (Dai et al., 2013Dai X. Sinharoy S. Udvardi M. Zhao P.X. PlantTFcat: an online plant transcription factor and transcriptional regulator categorization and analysis tool.BMC Bioinformatics. 2013; 14: 321Crossref PubMed Scopus (98) Google Scholar) and AtTFDB (Yilmaz et al., 2011Yilmaz A. Mejia-Guerra M.K. Kurz K. Liang X. Welch L. Grotewold E. AGRIS: Arabidopsis gene regulatory information server, an update.Nucleic Acids Res. 2011; 39: D1118-D1122Crossref PubMed Scopus (249) Google Scholar) were used as supporting evidence, as they use different methods for domain identification, therefore cannot be directly compared with PlnTFDB and PlantTFDB. A family annotation was considered more reliable if it had been assigned in both PlnTFDB and PlantTFDB, while a family unique to a single database was considered to be less reliable and required more evidence to support its identity. Under this criterion, 57 TF families/subfamilies were considered reliable, while 25 were considered less reliable (Supplemental Table 1). Comparison of domain assignment rules for the reliable families between PlnTFDB and PlantTFDB indicated that most were consistent, but rules of several subfamilies were missing in PlnTFDB. For example, PlantTFDB defines the AP2/ERF family to comprise three subfamilies, AP2, ERF, and RAV, while PlnTFDB only defines an AP2-EREBP family. In this example, the domain assignment rules used in PlantTFDB provide more details about the relationship between the superfamily and subfamily of AP2/ERF. Therefore, we adopted the rules for both the AP2/ERF superfamily and the three subfamilies (Figure 1A ). Similarly, the rules for the NF-Y (CCAAT) and MADS families were also adopted from PlantTFDB as they provide more detailed TF classification. In addition to handling missing rules in either PlnTFDB or PlantTFDB, we updated the domain assignment rules for several families including Homeobox (HB), BSD, and LIM, based on literature review (Figure 1A). In PlantTFDB, the HB superfamily is divided into five subfamilies: HD-ZIP, TALE, WOX, HB-PHD, and HB-other. In our consensus rule set, HB-TALE was further divided into two subfamilies, HB-BELL and HB-KNOX. We made this assignment because members of HB-BELL and HB-KNOX have different domains: HB-BELL contains POX and HB-KN domains, while HB-KNOX has a KNOX1 and a KNOX2 domain. In our study, the HD-ZIP_I/II domain that typifies the HB-HD-ZIP subfamily was replaced by the HALZ domain, which was specifically built from homeodomain-leucine zipper proteins. In addition, we updated the classification rule for the BSD family to require both a BSD and a PH_TFIIH domain, instead of requiring only the BSD domain as done by PlantTFDB and PlnTFDB. Finally, the LIM subfamily was updated to require two LIM domains (Weiskirchen and Günther, 2003Weiskirchen R. Günther K. The CRP/MLP/TLP family of LIM domain proteins: acting by connecting.Bioessays. 2003; 25: 152-162Crossref PubMed Scopus (85) Google Scholar) (Figure 1A; Supplemental Table 1). In reviewing the literature for the 25 TF families that were supported by only one of PlantTFDB and PlnTFDB, we found that six (WD40-like, TIG, FHA, Sigma70-like, TAZ, and mTERF) were inaccurately categorized as TFs (Supplemental Table 1). We excluded WD40-like since WD-repeat proteins perform diverse functions and using the existing rule would result in the identification of many non-TF WD40-like proteins. Based on multiple sequence alignments of TIG proteins, it was difficult to distinguish TFs from the TIG superfamily of proteins, which include not only TFs but also kinases and membrane proteins. Similarly, the FHA domain is present in a functionally diverse range of proteins that include not only TFs but also kinases, phosphatases and kinesins, and plant TFs containing an FHA domain have not been found. Sigma70-like proteins were also excluded from the TF category since they do not themselves bind to promoters; rather they function as components of an RNA polymerase holoenzyme involved in binding a core RNA polymerase to specific promoters. Finally, mTERF and TAZ were categorized as TRs instead of TFs because TAZ proteins function as coactivators alongside other regulatory proteins, and mTERF proteins exert a broad range of regulatory activities while not binding directly to promoters. To achieve low and balanced false positive and false negative rates, we excluded these six families from the rule set that defines TFs. Furthermore, the DDT family, which was categorized as TR in PlnTFDB, was categorized as TF. Based on a literature review, the remaining 16 TF families were included, resulting in a set of consensus rules that included 72 families/subfamilies for plant TF classification (Supplemental Table 1). The TR family classification rules were adopted from those used in PlnTFDB with support from PlantTFcat. Excluding the aforementioned DDT family, a total of 23 TR families including TAZ and mTERF that were incorrectly classified into TFs were derived from PlnTFDB. All these TR families were reviewed and accepted based on literature support (Supplemental Table 2). We developed the iTAK program base on the consensus rules we derived for the identification and classification of plant TFs/TRs/PKs (Supplemental Figure 1). To evaluate the performance of iTAK, the predicted Arabidopsis TFs with iTAK were systemically compared with those identified in PlantTFDB and PlnTFDB. The three datasets shared a total of 1602 TFs, accounting for approximately 90% of the TFs in PlnTFDB, PlantTFDB, and iTAK (Figure 1B). Although the majority of them were commonly identified as TFs and classified into the same families, we did observe some inconsistencies. The inconsistencies were mainly between PlnTFDB and PlantTFDB, while the iTAK classifications were consistent with one of the two other databases, with the exception of two genes, AT1G50680 and AT1G51120, which were assigned to the B3 family using iTAK, rather than to the AP2/ERF-RAV family by PlnTFDB and PlantTFDB, because they only contained B3 domains (Supplemental Table 3). This minor difference may reflect the recent update of the AP2 HMM profile in the Pfam database. Overall, the high consistency between iTAK and other studies indicates the high accuracy of TF identification and classification by iTAK. A total of 112, 28, and 14 TFs were identified only in PlnTFDB, PlantTFDB, or iTAK, respectively (Figure 1C; Supplemental Table 4). Five of the 14 iTAK-specific TFs were from the DDT family, which were inaccurately categorized as TRs in the other databases. Of the PlnTFDB-specific genes, 64 belonged to the mTERF, FHA, sigma70-like, and TAZ families, which should not be categorized as TFs. After eliminating these discrepant families, 48, 28, and 9 genes were predicted only by PlnTFDB, PlantTFDB, or iTAK, respectively. Furthermore, 87, 64, and 23 TFs were not identified by PlantTFDB, PlnTFDB, and iTAK, respectively, but were predicted and assigned to the same families by the other two (Figure 1D–1F). The smaller number of unique and missing identifications by iTAK indicates it achieved a better balance between false positives and false negatives. The reason that iTAK did not identify the 23 TFs was mainly due to the significance cutoff of the required domains (Supplemental Information; Supplemental Table 5). The identified TFs/TRs were also compared with other datasets, further supporting the high accuracy of iTAK (Supplemental Information). In summary, we have derived a set of consensus domain assignment rules for accurate identification and classification of plant TFs and TRs. We have developed a novel bioinformatics tool, iTAK, to facilitate genome-wide identification and classification of plant TFs, TRs, and PKs, and a comprehensive database for these regulatory proteins from sequenced plant species (Supplemental Information). These provide valuable tools and resources for the research community to study transcriptional regulations and signaling networks. This work was supported in part by a seed grant from the Association of Independent Plant Institutes (AIPI) to S.Y.R., P.Z., and Z.F. and grants from the National Science Foundation (IOS-0923312, IOS-1025642, and IOS-1339287 to Z.F., DBI-0960897 and DBI-1458597 to P.X.Z. and IOS-1026003 to S.Y.R.) and Department of Energy (DE-SC0008769) to S.Y.R.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined