谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Use of a Candida albicans SC5314 PacBio HiFi reads dataset to close gaps in the reference genome assembly, reveal a subtelomeric gene family, and produce accurate phased allelic sequences.

Lois L Hoyer, Brian A Freeman, Elizabeth K Hogan,Alvaro G Hernandez

Frontiers in Cellular and Infection Microbiology(2024)

引用 0|浏览4
暂无评分
摘要
Candida albicans SC5314 is the most-often used strain for molecular manipulation of the species. The SC5314 reference genome sequence is the result of considerable effort from many scientists and has advanced research into fungal biology and pathogenesis. Although the resource is highly developed and presented in a phased diploid format, the sequence includes gaps and does not extend to the telomeres on its eight chromosome pairs. Accurate SC5314 genome assembly is complicated by the presence of extensive repeated sequences and considerable allelic length variation at some loci. Advances in genome sequencing technology provide the tools to obtain highly accurate long-read data that span even the most-difficult-to-assemble genome regions. Here, we describe derivation of a PacBio HiFi data set and creation of a collapsed haploid telomere-to-telomere assembly of the SC5314 genome (ASM3268872v1) that revealed previously unknown features of the strain. ASM3268872v1 subtelomeric distances were up to 19 kb larger than in the reference genome and revealed a family of highly conserved DNA helicase-encoding genes at 10 of the 16 chromosome ends. We also describe alignments of individual HiFi reads to deduce accurate diploid sequences for the most notoriously difficult-to-assemble C. albicans genes: the agglutinin-like sequence (ALS) gene family. We provide a tutorial that demonstrates how the HiFi reads can be visualized to explore any region of interest. Availability of the HiFi reads data set and the ASM3268872v1 comparative guide assembly will streamline research efforts because accurate diploid sequences can be derived using simple in silico methods rather than time-consuming laboratory-bench approaches.
更多
查看译文
关键词
genome sequence,Candida albicans,pathogenic yeast genomes,PacBio sequence data,allelic sequences,telomere-to-telomere
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要