Back to Search
Start Over
Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model.
- Source :
-
Bioinformatics advances [Bioinform Adv] 2022 Aug 12; Vol. 2 (1), pp. vbac055. Date of Electronic Publication: 2022 Aug 12 (Print Publication: 2022). - Publication Year :
- 2022
-
Abstract
- While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data.<br />Availability and Implementation: Our software is available open source at https://github.com/nishatbristy007/NSB.<br />Supplementary Information: Supplementary data are available at Bioinformatics Advances online.<br /> (© The Author(s) 2022. Published by Oxford University Press.)
Details
- Language :
- English
- ISSN :
- 2635-0041
- Volume :
- 2
- Issue :
- 1
- Database :
- MEDLINE
- Journal :
- Bioinformatics advances
- Publication Type :
- Academic Journal
- Accession number :
- 35992043
- Full Text :
- https://doi.org/10.1093/bioadv/vbac055