Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods.

Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA)(2022)

引用 2|浏览4
暂无评分
摘要
Archaea are single-celled organisms found in practically every habitat and serve essential functions in the ecosystem, such as carbon fixation and nitrogen cycling. The classification of these organisms is challenging because most have not been isolated in a laboratory and are only found in ambient samples by their gene sequences. This paper presents an automated classification approach for any taxonomic level based on an ensemble method using non-comparative features. This methodology overcomes the problems of reference-based classification since it classifies sequences without resorting directly to the reference genomes, using the features of the biological sequences instead. Overall we obtained high results for classification at different taxonomic levels. For example, the Phylum classification task achieved 96% accuracy, whereas 91% accuracy was achieved in the genus identification task of archaea in a pool of 55 different genera. These results show that the proposed methodology is a fast, highly-accurate solution for archaea identification and classification, being particularly interesting in the applied case due to the challenging classification of these organisms. The method and complete study are freely available, under the GPLv3 license, at https://github.com/jorgeMFS/Archaea2.
更多
查看译文
关键词
Archaeal sequences,Feature-based classification,Taxonomic identification,Data compression,Feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要