An integrated Asian human SNV and indel benchmark established using multiple sequencing methods

SCIENTIFIC REPORTS(2020)

引用 4|浏览18
暂无评分
摘要
Sequencing technologies have been rapidly developed recently, leading to the breakthrough of sequencing-based clinical diagnosis, but accurate and complete genome variation benchmark would be required for further assessment of precision medicine applications. Despite the human cell line of NA12878 has been successfully developed to be a variation benchmark, population-specific variation benchmark is still lacking. Here, we established an Asian human variation benchmark by constructing and sequencing a stabilized cell line of a Chinese Han volunteer. By using seven different sequencing strategies, we obtained ~3.88 Tb clean data from different laboratories, hoping to reach the point of high sequencing depth and accurate variation detection. Through the combination of variations identified from different sequencing strategies and different analysis pipelines, we identified 3.35 million SNVs and 348.65 thousand indels, which were well supported by our sequencing data and passed our strict quality control, thus should be high confidence variation benchmark. Besides, we also detected 5,913 high-quality SNVs which had 969 sites were novel and located in the high homologous regions supported by long-range information in both the co-barcoding single tube Long Fragment Read (stLFR) data and PacBio HiFi CCS data. Furthermore, by using the long reads data (stLFR and HiFi CCS), we were able to phase more than 99% heterozygous SNVs, which helps to improve the benchmark to be haplotype level. Our study provided comprehensive sequencing data as well as the integrated variation benchmark of an Asian derived cell line, which would be valuable for future sequencing-based clinical development.
更多
查看译文
关键词
Next-generation sequencing,Rare variants,Science,Humanities and Social Sciences,multidisciplinary
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要