Improving the consistency of domain annotation within the Conserved Domain Database.

Myra K. Derbyshire,Noreen R. Gonzales,Shennan Lu,Jane He,Gabriele H. Marchler,Zhouxi Wang,Aron Marchler-Bauer

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION（2015）

引用 20|浏览75

暂无评分

摘要

When annotating protein sequences with the footprints of evolutionarily conserved domains, conservative score or E-value thresholds need to be applied for RPS-BLAST hits, to avoid many false positives. We notice that manual inspection and classification of hits gathered at a higher threshold can add a significant amount of valuable domain annotation. We report an automated algorithm that 'rescues' valuable borderline-scoring domain hits that are well-supported by domain architecture (DA, the sequential order of conserved domains in a protein query), including tandem repeats of domain hits reported at a more conservative threshold. This algorithm is now available as a selectable option on the public conserved domain search (CD-Search) pages. We also report on the possibility to 'suppress' domain hits close to the threshold based on a lack of well-supported DA and to implement this conservatively as an option in live conserved domain searches and for pre-computed results. Improving domain annotation consistency will in turn reduce the fraction of NR sequences with incomplete DAs.

查看译文

关键词

conserved domain database,domain annotation,consistency

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要