Back to Search Start Over

Construction of Customized Sub-Databases from NCBI-nr Database for Rapid Annotation of Huge Metagenomic Datasets Using a Combined BLAST and MEGAN Approach.

Authors :
Yu, Ke
Zhang, Tong
Source :
PLoS ONE. Apr2013, Vol. 8 Issue 4, p1-9. 9p.
Publication Year :
2013

Abstract

We developed a fast method to construct local sub-databases from the NCBI-nr database for the quick similarity search and annotation of huge metagenomic datasets based on BLAST-MEGAN approach. A three-step sub-database annotation pipeline (SAP) was further proposed to conduct the annotation in a much more time-efficient way which required far less computational capacity than the direct NCBI-nr database BLAST-MEGAN approach. The 1st BLAST of SAP was conducted using the original metagenomic dataset against the constructed sub-database for a quick screening of candidate target sequences. Then, the candidate target sequences identified in the 1st BLAST were subjected to the 2nd BLAST against the whole NCBI-nr database. The BLAST results were finally annotated using MEGAN to filter out those mistakenly selected sequences in the 1st BLAST to guarantee the accuracy of the results. Based on the tests conducted in this study, SAP achieved a speedup of āˆ¼150ā€“385 times at the BLAST e-value of 1eā€“5, compared to the direct BLAST against NCBI-nr database. The annotation results of SAP are exactly in agreement with those of the direct NCBI-nr database BLAST-MEGAN approach, which is very time-consuming and computationally intensive. Selecting rigorous thresholds (e.g. e-value of 1eā€“10) would further accelerate SAP process. The SAP pipeline may also be coupled with novel similarity search tools (e.g. RAPsearch) other than BLAST to achieve even faster annotation of huge metagenomic datasets. Above all, this sub-database construction method and SAP pipeline provides a new time-efficient and convenient annotation similarity search strategy for laboratories without access to high performance computing facilities. SAP also offers a solution to high performance computing facilities for the processing of more similarity search tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
19326203
Volume :
8
Issue :
4
Database :
Academic Search Index
Journal :
PLoS ONE
Publication Type :
Academic Journal
Accession number :
87677153
Full Text :
https://doi.org/10.1371/journal.pone.0059831