Normalization methods for microbial abundance data strongly affect correlation estimates

bioRxiv(2020)

引用 8|浏览39
暂无评分
摘要
Consistent normalization of microbial genomic survey count data is fundamental to modern microbiome research. Technical artifacts in these data often obstruct standard comparison of microbial composition across samples and experiments. To correct for sampling bias, library size, and technical variability, a number of different normalization methods have been proposed, including adaptations of RNA-seq analysis work flows and log-ratio transformations from compositional data analysis. However, the effects of data normalization on higher-order summary statistics has remained elusive. We review and compare popular data normalization schemes and assess their effect on subsequent correlation estimation. Application of these normalization methods to the largest publicly available human gut microbiome dataset show substantial variation among patterns of correlation. We show that log-ratio and variance-stabilization transformations provide the most consistent estimates across experiments of different sample sizes. We also show that data analysis methods that rely on correlation, such as data clustering and network inference, differ depending on the normalization schemes. These findings have important implications for microbiome studies in multiple stages of analysis.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要