Hybrid sequence-structure based HMM models leverage the identification of homologous proteins: the example of class II fusion

bioRxiv(2018)

引用 1|浏览15
暂无评分
摘要
We present a sequence-structure based method characterizing a set of functionally related proteinsexhibiting low sequence identity and loose structural conservation. Given a (small) set of structures, ourmethod consists of three main steps. First, pairwise structural alignments are combined with multi-scalegeometric analysis to produce structural motifs i.e. regions structurally more conserved than the wholestructures. Second, the sub-sequences of the motifs are used to build profile hidden Markov models(HMM) biased towards the structurally conserved regions. Third, these HMM are used to retrieve fromUniProtKB proteins harboring signatures compatible with the function studied, in a bootstrap fashion.We apply these hybrid HMM to investigate two questions related to class II fusion proteins, anespecially challenging class since known structures exhibit low sequence identity (less than 15%) andloose structural similarity (of the order of 15A in lRMSD ). In a first step, we compare the performancesof our hybrid HMM against those of sequence based HMM. Using various learning sets, we show thatboth classes of HMM retrieve unique species. The number of unique species reported by both classes ofmethods are comparable, stressing the novelty brought by our hybrid models. In a second step, we use ourmodels to identify 17 plausible HAP2-GSC1 candidate sequences in 10 different drosophila melanogasterspecies. These models are not identified by the PFAM family HAP2-GCS1 (PF10699), stressing theability of our structural motifs to capture signals more subtle than whole Pfam domains.In a more general setting, our method should be of interest for all cases functional families with lowsequence identity and loose structural conservation.Our software tools are available from the FunChaT package of the Structural Bioinformatics Library(http://sbl.inria.fr).
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要