A comparison of bioinformatics pipelines for compositional analysis of the human gut microbiome

biorxiv(2023)

Cited 0|Views39
No score
Abstract
Investigating the impact of gut microbiome on human health is a rapidly growing area of research. A significant limiting factor in the progress in this field is the lack of consistency between study results, which hampers the correct biological interpretation of findings. One of the reasons is variation of the applied bioinformatics analysis pipelines. This study aimed to compare five frequently used bioinformatics pipelines (NG-Tax 1.0, NG-Tax 2.0, QIIME, QIIME2 and mothur) for the analysis of 16S rRNA marker gene sequencing data and determine whether and how the analytical methods affect the downstream statistical analysis results. Based on publicly available case-control analysis of ADHD and two mock communities, we show that the choice of bioinformatic pipeline does not only impact the analysis of 16S rRNA gene sequencing data but consequently also the downstream association results. The differences were observed in obtained number of ASVs/OTUs (range: 1,958 - 20,140), number of unclassified ASVs/OTUs (range: 210 - 8,092) or number of genera (range: 176 - 343). Also, the case versus control comparison resulted in different results across the pipelines. Based on our results we recommend: i) QIIME1 and mothur when interested in rare and/or low-abundant taxa, ii) NG-Tax1 or NG-Tax2 when favouring stringent artefact filtering, iii) QIIME2 for a balance between two abovementioned points, and iv) to use at least two pipelines to assess robustness of the results. This work illustrates the strengths and limitations of frequently used microbial bioinformatics pipelines in the context of biological conclusions of case-control comparisons. With this, we hope to contribute to “best practice” approaches for microbiome analysis, promoting methodological consistency and replication of microbial findings. Author Summary Studies increasingly demonstrate the relevance of gut microbiota in understanding human health and disease. However, the lack of consistency between study results is a significant limiting factor of progress in this field. The reasons for this include variation in study design, sample size, bacterial DNA extraction and sequencing method, bioinformatics analysis pipeline and statistical analysis methodology. This paper focuses on the variation generated by bioinformatics pipelines. A choice of a bioinformatic pipeline can influence the assessment of microbial diversity. However, it is unclear to what extent and how the results and conclusion of a case-control study can be influenced. Therefore, we compared the results of a case-control study across different pipelines (applying default settings) while using the same dataset. Our results indicate a lack of consistency across the pipelines. We show that the choice of bioinformatic pipeline not only affects the analysis results of 16S rRNA gene sequencing data from the gut microbiome, but also the associated conclusions for the case-control study. This means different conclusions would be drawn from the same data analysed with different bioinformatic pipeline. ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined