Identification the source of fecal contamination for geographically unassociated samples with a statistical classification model based on support vector machine.

Journal of hazardous materials(2020)

Cited 7|Views7
No score
Abstract
The bacterial diversity and corresponding biological significance revealed by high-throughput sequencing contribute massive information to source tracking of fecal contamination. The performances of classification models on predicting the fecal source of geographical local and foreign samples were examined herein, by applying support vector machine (SVM) algorithm. Random forest (RF) and Adaboost were applied for comparison as well. Discriminatory sequences were selected from Clostridiale, Bacteroidales, or Lactobacillales bacterial groups using extremely randomized trees (ExtraTrees). 1.51-12.64% of the unique sequences in the original library composed the representative markers, and they contributed 70% of the discrepancies between source microbiomes. The overall accuracy of the SVM model and the RF model on local samples was 96.08% and 98.04%, respectively, higher than that of the Adaboost (90.20%). As for the non-local samples, the SVM assigned most of the fecal samples into the correct category while several false-positive judgments occurred in closely related groups. The results in this paper suggested that the SVM was a time-saving and accurate method for fecal source tracking in contaminated water body with the potential capability of executing tasks based on geographically unassociated samples, and underlined the necessity of qPCR analysis for accurate detection of human source pollution.
More
Translated text
Key words
Machine learning,16S rRNA,Amplicon sequencing,Fecal source tracking,SVM
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined