Back to Search
Start Over
Accelerating Sequence Alignment to Graphs
- Source :
- IPDPS
- Publication Year :
- 2019
- Publisher :
- Cold Spring Harbor Laboratory, 2019.
-
Abstract
- Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.
- Subjects :
- 0303 health sciences
Xeon
Computer science
0206 medical engineering
Locality
Parallel algorithm
02 engineering and technology
Graph
Vertex (geometry)
Dynamic programming
03 medical and health sciences
High memory
0302 clinical medicine
String graph
SIMD
Time complexity
Algorithm
020602 bioinformatics
030217 neurology & neurosurgery
030304 developmental biology
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- IPDPS
- Accession number :
- edsair.doi.dedup.....febeb2d89cfc3e184e92fb121c813422
- Full Text :
- https://doi.org/10.1101/651638