Back to Search Start Over

Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model.

Authors :
Balaban M
Bristy NA
Faisal A
Bayzid MS
Mirarab S
Source :
Bioinformatics advances [Bioinform Adv] 2022 Aug 12; Vol. 2 (1), pp. vbac055. Date of Electronic Publication: 2022 Aug 12 (Print Publication: 2022).
Publication Year :
2022

Abstract

While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data.<br />Availability and Implementation: Our software is available open source at https://github.com/nishatbristy007/NSB.<br />Supplementary Information: Supplementary data are available at Bioinformatics Advances online.<br /> (© The Author(s) 2022. Published by Oxford University Press.)

Details

Language :
English
ISSN :
2635-0041
Volume :
2
Issue :
1
Database :
MEDLINE
Journal :
Bioinformatics advances
Publication Type :
Academic Journal
Accession number :
35992043
Full Text :
https://doi.org/10.1093/bioadv/vbac055