Back to Search
Start Over
Exploring Frequented Regions in Pan-Genomic Graphs
- Source :
- BCB
- Publication Year :
- 2018
-
Abstract
- We consider the problem of identifying regions within a pan-genome De Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work, we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacterium) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.
- Subjects :
- 0301 basic medicine
Staphylococcus aureus
animal structures
0206 medical engineering
Genomics
02 engineering and technology
Computational biology
Saccharomyces cerevisiae
Biology
Bioinformatics
De Bruijn graph
03 medical and health sciences
symbols.namesake
Databases, Genetic
Genetics
Computer Graphics
Cluster analysis
Genome
Efficient algorithm
Applied Mathematics
Sequence Analysis, DNA
Graph
030104 developmental biology
symbols
020602 bioinformatics
Algorithms
Biotechnology
Subjects
Details
- ISSN :
- 15579964
- Volume :
- 16
- Issue :
- 5
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM transactions on computational biology and bioinformatics
- Accession number :
- edsair.doi.dedup.....3bdbb4d9c8de63b855bbaad142f5bf2e