Metagenomic coverage bias at transcription start sites is correlated with gene expression

biorxiv(2024)

引用 0|浏览2
暂无评分
摘要
Metagenomic sequencing is presumed to provide unbiased sampling of all the genetic material in a sample. Downstream analysis methods, such as binning, gene copy number analysis, structural variations, or single nucleotide polymorphism analysis, commonly assume an even distribution across the genome after accounting for known artefacts such as GC content. We discovered coverage bias across gut microbiome species, manifesting as a difference in coverage before and after bacterial transcription start sites. Using matched metatranscriptomic and metagenomic sequencing data, we demonstrate that this bias correlates with gene expression. Potential artefacts such as the sequencing technology, reference genome used for alignment, and mappability bias were investigated across multiple datasets and shown to not be factors for association. While GC bias was found correlated with coverage bias, the association of coverage bias with gene expression remains significant after adjusting for GC bias. Paired-end read mapping demonstrated an enrichment in 5’ read ends immediately downstream of the TSS which was partly a byproduct of unmapped reads upstream of the TSS. Our observations suggest the existence of strain-level variation where sequence variation in the promoter site region is preventing proper read alignment to the reference genome. The correlation of this phenomenon with gene expression may also reflect evolutionary footprints for fine-tuning the regulation of gene expression. Understanding the source of this sequence variation and the biological implications of this artefact will be useful not only to better characterise microbial functions but also to improve interpretations of strain level dynamics. Importance Sequencing coverage calculated from metagenomic sequencing data is extensively used in the microbiome field, providing valuable information about microbial abundances, gene (functional) abundances, growth rates, and genomic variations. Understanding factors that impact the distribution of coverage along genomes is therefore important for multiple applications. In this study, we report on uneven read coverage across the transcription start sites of bacterial genomes that is correlated with gene expression levels. We determine that this bias is independent of multiple factors including GC bias, and arises due to higher strain divergence from reference genomes upstream of the transcript start site. We propose that evolutionary finetuning of gene expression in competitive microbial ecosystems can drive genetic mutations at the promoter site. Our findings suggest the potential to glean gene regulatory information from metagenomic data, and better understand how ecological factors shape genomes in the microbiome and their sequencing coverage.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要