Hardy Weinberg Exact Test In Large Scale Variant Calling Quality Control

bioRxiv(2016)

引用 1|浏览24
暂无评分
摘要
Hardy Weinberg Equilibrium (HWE) test is widely used as a quality control measure to detect sequencing artifacts like mismapping, allelic dropout and biases. However, in the high throughput sequencing era, where the sample size is beyond a thousand scale, the utility of HWE test in reducing the false positive rate remains unclear. In this paper, we demonstrate that HWE test has limited power in identifying sequencing artifacts when the variant allele frequency is lower than 1% in a variant call set produced from more than five thousand whole genome sequenced samples from two homogeneous populations. We develop a novel strategy of implementing HWE filtering in which we incorporate site frequency spectrum information and determine the p-value cutoff which optimizes the tradeoff between sensitivity and specificity. The novel strategy is shown to outperform the exact test of HWE with an empirical constant p-value cutoff regardless of the sequencing sample size. We also present best practice recommendations for identifying possible sources of false positives from large sequencing datasets based on an analysis of intrinsic biases in the variant calling process. Our novel strategy of determining the HWE test p-value cutoff and applying the test to the common variants provides a practical approach for the variant level quality controls in the upcoming sequencing projects with tens to hundreds of thousand of samples.
更多
查看译文
关键词
Hardy Weinberg Equilibrium,whole genome sequencing,variant calling quality control,false positive rate,sequencing artifacts,site frequency spectrum,QA/QC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要