SeqScreen - a biocuration platform for robust taxonomic and biological process characterization of nucleic acid sequences of interest.

BIBM(2019)

引用 8|浏览20
暂无评分
摘要
Rapid advancements in synthetic biology and nucleic acid synthesis, in particular concerns about its intentional or accidental misuse, call for more sophisticated screening tools to identify genes of interest within short sequence fragments. One major gap in predicting genes of concern is the inadequacy of current tools and ontologies to describe the specific biological processes of pathogenic proteins. The objective of this work is to design software that sensitively assigns taxonomic classifications, functional annotations, and biological processes of interest to short nucleotide sequences of unknown origin (50bp-1,000bp). The overarching goal is to perform sensitive characterization of short sequences and highlight specific pathogenic biological processes of interest (BPoIs). The SeqScreen software executes these tasks in analytical workflows with Nextflow and outputs results in a tab-delimited report. Local and global alignments differentiate hits to taxonomically-related sequences from similar but unrelated sequences, and an ensemble approach leverages multiple tools and databases to assign a variety of functional terms to each query sequence. Final biological process assessments are made from the predicted functional annotations, which leverage information in pre-existing databases, as well as new custom biocurations. Machine learning models predict each biological process of interest on large protein databases before incorporation into the SeqScreen framework to streamline computational efficiency, ensure reproducible results, allow for version control, and facilitate the review of the automated predictions by expert biocurators. The SeqScreen source code is available at https://gitlab.com/treangenlab/seqscreen.
更多
查看译文
关键词
taxonomically-related sequences,query sequence,protein databases,biocuration platform,biological process characterization,nucleic acid sequences,synthetic biology,nucleic acid synthesis,screening tools,ontologies,taxonomic classifications,nucleotide sequences,SeqScreen software,pathogenic biological processes of interest,Nextflow,machine learning models,bioinformatics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要