201. Readjoiner: a fast and memory efficient string graph-based sequence assembler
- Author
-
Stefan Kurtz and Giorgio Gonnella
- Subjects
Theoretical computer science ,Computer science ,Computation ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,De Bruijn graph ,symbols.namesake ,Structural Biology ,String graph ,Humans ,Computer Simulation ,Molecular Biology ,lcsh:QH301-705.5 ,Transitive relation ,Models, Genetic ,Genome, Human ,Methodology Article ,Applied Mathematics ,Sequence Analysis, DNA ,Graph ,Computer Science Applications ,lcsh:Biology (General) ,symbols ,Graph (abstract data type) ,lcsh:R858-859.7 ,Algorithms ,Software - Abstract
Background Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. Results Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. Conclusions Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.
- Published
- 2012