VirusTaxo: Taxonomic classification of virus genome using multi-class hierarchical classification by k-mer enrichment

Genomics(2021)

Cited 0|Views1
No score
Abstract
Taxonomic classification of viruses is a multi-class hierarchical classification problem, as taxonomic ranks (e.g., order, family and genus) of viruses are hierarchically structured and have multiple classes in each rank. Classification of biological sequences which are hierarchically structured with multiple classes is challenging. Here we developed a machine learning architecture, VirusTaxo, using a multi-class hierarchical classification by k-mer enrichment. VirusTaxo classifies DNA and RNA viruses to their taxonomic ranks using genome sequence. To assign taxonomic ranks, VirusTaxo extracts k-mers from genome sequence and creates bag-of-k-mers for each class in a rank. VirusTaxo uses a top-down hierarchical classification approach and accurately assigns the order, family and genus of a virus from the genome sequence. The average accuracies of VirusTaxo for DNA viruses are 99% (order), 98% (family) and 95% (genus) and for RNA viruses 97% (order), 96% (family) and 82% (genus). VirusTaxo can be used to detect taxonomy of novel viruses using full length genome or contig sequences. Availability Online version of VirusTaxo is available at . ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined