Back to Search Start Over

Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.

Authors :
Su, Wenhe
Liao, Xiangke
Lu, Yutong
Zou, Quan
Peng, Shaoliang
Source :
Journal of Computational Biology. Dec2017, Vol. 24 Issue 12, p1230-1242. 13p.
Publication Year :
2017

Abstract

Multiple sequence alignment (MSA) is an essential prerequisite and dominant method to deduce the biological facts from a set of molecular biological sequences. It refers to a series of algorithmic solutions for the alignment of evolutionarily related sequences while taking into account evolutionary events such as mutations, insertions, deletions, and rearrangements under certain conditions. These methods can be applied to DNA, RNA, or protein sequences. In this work, we take advantage of a center-star strategy to reduce the MSA problem to pairwise alignments, and we use a suffix tree to match identical substrings between two pairwise sequences. Multiple sequence alignment based on a suffix tree and center-star strategy (MASC) can accomplish MSA in O( mn ), which is linear time complexity, where m is the number of sequences and n is the average length of sequences. Furthermore, we execute our method on the Spark-distributed parallel framework to deal with ever-increasing massive data sets. Our method is significantly faster than previous techniques, with no loss in accuracy for highly similar nucleotide sequences like homologous sequences, which we experimentally demonstrate. Comparing with mainstream MSA tools (e.g., MAFFT), MASC could finish the alignment of 67,200 sequences, longer than 10,000 bps, in 9 minutes, which takes MAFFT >3.5 days. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10665277
Volume :
24
Issue :
12
Database :
Academic Search Index
Journal :
Journal of Computational Biology
Publication Type :
Academic Journal
Accession number :
126711146
Full Text :
https://doi.org/10.1089/cmb.2017.0040