Back to Search
Start Over
Rapid alignment-free phylogenetic identification of metagenomic sequences
- Source :
- Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2019, 35 (18), pp.3303-3312. ⟨10.1101/328740⟩, Bioinformatics, 2019, 35 (18), pp.3303-3312. ⟨10.1093/bioinformatics/btz068⟩, Bioinformatics, Oxford University Press (OUP), 2019, 35 (18), pp.3303-3312. ⟨10.1093/bioinformatics/btz068⟩
- Publication Year :
- 2018
-
Abstract
- Motivation Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing. Results We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics. Availability and implementation Program and sources freely available for download at https://github.com/blinard-BIOINFO/RAPPAS. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- 0106 biological sciences
Statistics and Probability
Computer science
Sequence analysis
Sequence alignment
Computational biology
computer.software_genre
[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics, Phylogenetics and taxonomy
010603 evolutionary biology
01 natural sciences
Biochemistry
DNA sequencing
03 medical and health sciences
chemistry.chemical_compound
Phylogenetics
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN]
Environmental DNA
Molecular Biology
Phylogeny
030304 developmental biology
Sequence
0303 health sciences
Likelihood Functions
Phylogenetic tree
Biological classification
Sequence Analysis, DNA
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Computer Science Applications
Computational Mathematics
Tree (data structure)
Computational Theory and Mathematics
chemistry
Metagenomics
Precomputation
Metagenome
Data mining
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
computer
Sequence Alignment
DNA
Algorithms
Software
Subjects
Details
- ISSN :
- 13674811, 13674803, and 14602059
- Volume :
- 35
- Issue :
- 18
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....4b91bdfde15d89cd6213427e94a3f292
- Full Text :
- https://doi.org/10.1101/328740⟩