Back to Search
Start Over
Evolutionary distances in the twilight zone--a rational kernel approach
- Source :
- PLoS ONE, Vol 5, Iss 12, p e15788 (2010), PLoS ONE
- Publication Year :
- 2010
- Publisher :
- Public Library of Science (PLoS), 2010.
-
Abstract
- Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.<br />Comment: to appear in PLoS ONE
- Subjects :
- FOS: Computer and information sciences
Cancer Research
Molecular Sequence Data
lcsh:Medicine
Sequence alignment
Machine Learning (stat.ML)
Protein Structure, Secondary
Evolution, Molecular
Statistics - Machine Learning
Chlorophyta
Evolutionary Modeling
Computer Simulation
Evolutionary Systematics
Amino Acid Sequence
Divergence (statistics)
Quantitative Biology - Populations and Evolution
lcsh:Science
Biology
Alignment-free sequence analysis
Phylogeny
Probability
Physics
Genetics
Sequence
Likelihood Functions
Evolutionary Biology
Multidisciplinary
Multiple sequence alignment
Models, Statistical
Phylogenetic tree
Applied Mathematics
String (computer science)
lcsh:R
Populations and Evolution (q-bio.PE)
Computational Biology
Information bottleneck method
Markov Chains
Phylogenetics
FOS: Biological sciences
DNA, Intergenic
lcsh:Q
Algorithm
Sequence Alignment
Sequence Analysis
Algorithms
Mathematics
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 5
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....883fc546b53379356e6707627eb336cd