Chrome Extension
WeChat Mini Program
Use on ChatGLM

Average weighted nucleotide diversity is more precise than pixy in estimating the true value of pi from sequence sets containing missing data

Molecular ecology resources(2023)

Cited 1|Views5
No score
Abstract
Nucleotide diversity remains an important statistic in population genetic/genomic studies. Although recent advances in massive sequencing make generating sequence data sets cheaper and faster, currently used technologies often introduce substantial amounts of missing nucleotides in their output. A novel method of estimating pi from data sets containing missing data - pixy - has also recently been proposed. In this study, the pixy estimator, pi(pixy), was compared to average weighted nucleotide diversity, pi(W). The estimators were tested both on sequences simulated in fastsimcoal and real sequence sets. Both sets were modified by random insertion of missing nucleotides. Weighted nucleotide diversity performed better in all pairwise comparisons. It was characterized by a smaller error and a narrower distribution of the results. pi(pixy) tends to overestimate the nucleotide diversity when both the proportion of missing data and the level of variation is low. Of the two estimators, only pi(W) estimated the true nucleotide diversity in a part of the simulations. A simple formula for estimating pi(W) allows for easy integration of the estimator in packages such as pixy, which would allow obtaining more precise estimates of nucleotide diversity either in a sliding window or for discrete genomic regions.
More
Translated text
Key words
bioinfomatics/phyloinfomatics, genetic variation, missing data, next-generation sequencing, nucleotide diversity, statistics
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined