Chrome Extension
WeChat Mini Program
Use on ChatGLM

PASS2: A Database of Structure-Based Sequence Alignments of Protein Structural Domain Superfamilies.

IJKDB(2011)

Cited 5|Views3
No score
Abstract
A detailed comparison of protein domains that belong to families and superfamilies shows that structure is better conserved than sequence during evolutionary divergence. Sequence alignments, guided by structural features, permit a better sampling of the protein sequence space and effective construction of libraries for fold recognition. Sequence alignments are useful evolutionary models in defining structure-function relationships for protein superfamilies. The PASS2 database, maintained by the authors, presents alignments of proteins related at the superfamily level and characterised by low sequence similarity. The number of new superfamilies increased to 47% compared with the previous PASS2 version, which shows the crucial importance of updating the PASS2 database. In the current release of the PASS2 database, they align protein superfamilies using a structural alignment protocol. The authors also introduce two alignment assessment methods that depend on the average structural deviations of domains and the extent of conserved secondary structures. They also integrate new and important structural and sequence features at the superfamily level into the database. These features are conserved-unconserved blocks in proteins, spatial distribution of sequences using principal component analysis and a statistical view for each superfamily. The authors suggest that highly structurally deviant superfamily members could be removed as outliers, so that such extreme distant relationships will not obscure the alignment. They report a nearly-automated, updated version of the superfamily alignment database, consisting of 1776 superfamilies and 9536 protein domains, that is in direct correspondence with the SCOP (1.73) database. DOI: 10.4018/jkdb.2011100104 54 International Journal of Knowledge Discovery in Bioinformatics, 2(4), 53-66, October-December 2011 Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION The function and biological role of a newly sequenced protein is usually inferred from a previously characterized protein using sequence and/or structure comparison methods. Comparison of protein structures may reveal distant evolutionary relationships that would not be detected from sequence information alone, where sequence identities are low (Baker & Sali, 2001; Koehl, 2001), thereby helping to infer newer functional associations (Koehl, 2001). Functional inference based on sequence similarity and function annotation transfer are possible at high sequence identities (> 70%) and less reliably so at lower sequence identities (≤ 40%) (Mallika, Bhaduri, & Sowdhamini, 2002). The connection of enormous number of protein sequences to fewer numbers of precharacterized proteins of known structure and function, by computational approaches, is one of the effective means of predictive exercises at low sequence identities. As it is not feasible to experimentally study every protein in all genomes, structure prediction and function annotation through computational approaches at a larger level would help in bridging the gap between them. All protein structures determined experimentally, either by X-ray crystallography or NMR spectroscopy are deposited in a centralized resource, the Protein Data Bank (PDB) (Berman et al., 2000). Currently, there are more than 60000 protein structures in the PDB and are organised in different databases by hierarchical classification schemes. A striking feature derived from this wealth of data is that nearly all proteins have some structural similarities to other proteins. Although these similarities may arise from general principles of physics and chemistry that limit the number of protein folds, they may also result from evolutionary relationships. Approaches that identify and examine these structural relationships have relied on the classification of proteins, using either structural information alone, CATH (Orengo et al., 1997) and FSSP (Holm & Sander, 1994) or a combination of structural and evolutionary information with a good deal of human expertise, SCOP (Andreeva et al., 2008). SCOP 1.73 release holds 97178 domains from 34494 PDB entries, which are grouped into merely 1086 folds. This enormous redundancy can be exploited to provide working solutions to the problem of the structural coverage of protein space (Friedberg, Jaroszewski, Ye, & Godzik, 2004; Godzik, 2004). Protein structural domains within a superfamily were chosen from SCOP database. Only domains, which are ≤40% sequence identity with each other within a SCOP superfamily have been considered for the current work. This filter was useful in order to reduce the computational time of applying rigorous structure comparison procedures on closely related structural entries where sequence alignments must be relatively straightforward. MATERIALS AND METHODS Protein Structural Domain Dataset The information about protein structural domains and their boundaries were obtained from SCOP 1.73v release (Andreeva et al., 2008) and their corresponding structural coordinates, which are having ≤40% sequence identity at their superfamily level, were downloaded from ASTRAL compendium (Chandonia et al., 2004). The current structural database was constructed as in the previous PASS2 version (Bhaduri, Pugalenthi, & Sowdhamini, 2004 2004) with some modifications such as the inclusion of assessment of alignments (Figure 1). According to the number of structural entries in each superfamily, in this update, we have categorized them as single member superfamilies (SMS), two member superfamilies (TMS) and multi-member superfamilies (MMS). The flowchart describes three phases of the algorithm, namely the initial alignment phase, the final alignment phase and the alignment assessment phase. The initial alignment phase includes building the initial alignment of Two member superfamilies by using programs such as MINRMS or ClustalW or MALIGN 12 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/pass2-database-structure-basedsequence/73911?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science, InfoSciHealthcare Administration, Clinical Practice, and Bioinformatics eJournal Collection, InfoSci-Knowledge Discovery, Information Management, and Storage eJournal Collection, InfoSci-Physical Sciences, Biological Sciences, and Engineering eJournal Collection. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined