1. An alignment- and reference-free strategy using <italic>k</italic>-mer present pattern for population genomic analyses.
- Author
-
Shi, Guohui, Dai, Yi, Zhou, Da, Chen, Mengmeng, Zhang, Jiaqi, Bi, Yilong, Liu, Shuai, and Wu, Qi
- Abstract
Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments.
K -mer decomposition is an alternative to graph-based pangenomes. However, how to directly usek -mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants ofk -mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified withk -mers recapitulates that obtained using SNPs, indicating the effectiveness ofk -mer-based approach, and higher genetic diversity within real dataset supportedk -mers contained more genetic variants. Based onk -mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution ofS. cerevisiae . Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF