Back to Search Start Over

PASTASpark: multiple sequence alignment meets Big Data.

Authors :
Abuín, José M.
Pena, Tomás F.
Pichel, Juan C.
Source :
Bioinformatics; Sep2017, Vol. 33 Issue 18, p2948-2950, 3p
Publication Year :
2017

Abstract

Motivation: One basic step in many bioinformatics analyses is the multiple sequence alignment. One of the state-of-the-art tools to perform multiple sequence alignment is PASTA (Practical Alignments using SATe' and TrAnsitivity). PASTA supports multithreading but it is limited to process datasets on shared memory systems. In this work we introduce PASTASpark, a tool that uses the Big Data engine Apache Spark to boost the performance of the alignment phase of PASTA, which is the most expensive task in terms of time consumption. Results: Speedups up to 10° with respect to single-threaded PASTA were observed, which allows to process an ultra-large dataset of 200 000 sequences within the 24-h limit. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13674803
Volume :
33
Issue :
18
Database :
Complementary Index
Journal :
Bioinformatics
Publication Type :
Academic Journal
Accession number :
125106185
Full Text :
https://doi.org/10.1093/bioinformatics/btx354