Identifying, understanding, and correcting technical biases on the sex chromosomes in next-generation sequencing data

bioRxiv(2018)

引用 37|浏览30
暂无评分
摘要
Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. This sequence homology can cause the mismapping of short sequencing reads derived from the sex chromosomes and affect variant calling and other downstream analyses. Understanding, and correcting, this problem is critical for medical genomics and population genomic inference. Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that: 1) aids in the inference of sex chromosome complement from next-generation sequencing data, 2) corrects erroneous read mapping on the sex chromosomes, and 3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We show how these metrics can be used to identify XX and XY individuals across diverse sequencing experiments, including low and high coverage whole genome sequencing, and exome sequencing. We also show that the default steps taken by XYalign correct many mismapped reads on the sex chromosomes, resulting in more accurate variant calling. Finally, we discuss how the flexibility of XYalign9s framework can be leveraged for other use cases including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3).
更多
查看译文
关键词
X chromosome,Y chromosome,ploidy,aneuploidy,genomics,variant calling,mapping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要