An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses

Guohui Shi,Yi Dai,Da Zhou,Mengmeng Chen,Jiaqi Zhang, Yilong Bi, Shuai Liu, Qi Wu

Mycology(2024)

Cited 0|Views7
No score
Abstract
Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments. K-mer decomposition is an alternative to graph-based pangenomes. However, how to directly use k-mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants of k-mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267 Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified with k-mers recapitulates that obtained using SNPs, indicating the effectiveness of k-mer-based approach, and higher genetic diversity within real dataset supported k-mers contained more genetic variants. Based on k-mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution of S. cerevisiae. Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species.
More
Translated text
Key words
K-mers,population genomics,genetic diversity,alignment-free,reference-free
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined