Sorry, I don't understand your search. ×
Back to Search Start Over

Validating Paired-End Read Alignments in Sequence Graphs

Authors :
Chirag Jain and Haowen Zhang and Alexander Dilthey and Srinivas Aluru
Jain, Chirag
Zhang, Haowen
Dilthey, Alexander
Aluru, Srinivas
Chirag Jain and Haowen Zhang and Alexander Dilthey and Srinivas Aluru
Jain, Chirag
Zhang, Haowen
Dilthey, Alexander
Aluru, Srinivas
Publication Year :
2019

Abstract

Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1358725883
Document Type :
Electronic Resource
Full Text :
https://doi.org/10.4230.LIPIcs.WABI.2019.17