1. Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.
- Author
-
Warnow T and Mirarab S
- Subjects
- Algorithms, Big Data, Computer Simulation, Models, Statistical, Computational Biology methods, Sequence Alignment methods, Software
- Abstract
The estimation of very large multiple sequence alignments is a challenging problem that requires special techniques in order to achieve high accuracy. Here we describe two software packages-PASTA and UPP-for constructing alignments on large and ultra-large datasets. Both methods have been able to produce highly accurate alignments on 1,000,000 sequences, and trees computed on these alignments are also highly accurate. PASTA provides the best tree accuracy when the input sequences are all full-length, but UPP provides improved accuracy compared to PASTA and other methods when the input contains a large number of fragmentary sequences. Both methods are available in open source form on GitHub.
- Published
- 2021
- Full Text
- View/download PDF