Whole Chromosome Haplotype Phasing from Long-Range Sequencing

bioRxiv(2019)

引用 4|浏览10
暂无评分
摘要
Haplotype phase represents the collective genetic variation between homologous chromosomes and is an essential feature of polyploid genomes. Determining the haplotype phase requires knowledge of both the genotypes at variant sites and their linkage across each homologous chromosome. Although short-read sequencing can produce accurate genotype information, it cannot resolve linkage between genotypes due to the short size (≲1kb) of sequencing fragments. Long-read and long-range sequencing technologies can reveal linkage information across a wide range of genomic lengths (10kb-100 Mb), but such information is often sparse and contaminated with different sources of errors. To what extent can long-range sequencing produce accurate long-range haplotype information remains unknown. Here we describe a general computational framework for inferring haplotype phase and assessing phasing accuracy from long-range sequencing data using a one-dimensional spin model. Building on this model, we demonstrate a two-tier phasing strategy that enables complete whole-chromosome phasing of diploid genomes combining 60× linked-reads sequencing and 60× Hi-C sequencing. The computationally inferred haplotypes from long-range sequencing show high completeness (>95%) and accuracy (~99%) when compared to haplotypes directly determined from sequencing of single chromosomes. Our results provide a scalable solution to generating completely phased genomes from bulk sequencing and enable haplotype-resolved genome analysis at large.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要