Large contribution of repeats to genetic variation in a transmission cluster of Mycobacterium tuberculosis

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Repeats are the most diverse and dynamic, but also the least well understood component of microbial genomes. For all we know, repeat-associated mutations such as duplications, deletions, inversions, and gene conversion might be as common as point mutations, but because of short-read myopia and methodological bias they have received much less attention. Long-read sequencing opens the perspective of resolving repeats and systematically investigating the mutations they induce. For this study, we assembled the genomes of 16 closely related strains of the bacterial pathogen Mycobacterium tuberculosis from PacBio HiFi reads, with the aim of characterizing the full spectrum of DNA polymorphisms. We find that complete and accurate genomes can be assembled from HiFi reads, with read size being the main limitation in the presence of duplications. By combining a reference-free pangenome graph with extensive repeat annotation, we identified 110 variants, 58 of which can be assigned to repeat-associated mutational mechanisms such as strand slippage and homologous recombination. While recombination events are less frequent than point mutations, they can affect large regions and introduce multiple variants at once, as shown by three gene conversion events and a duplication of 7.3 kb that involve ppe18 and ppe57, two genes possibly involved in immune subversion. Our study shows that the contribution of repeat-associated mechanisms of mutation can be similar to that of point mutations at the microevolutionary scale of an outbreak. A large reservoir of unstudied genetic variation in this "monomorphic" bacterial pathogen awaits investigation. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要