Back to Search Start Over

NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes [version 2; referees: 2 approved, 1 approved with reservations, 1 not approved]

Authors :
Javier Ramiro-Garcia
Gerben D. A. Hermes
Christos Giatsis
Detmer Sipkema
Erwin G. Zoetendal
Peter J. Schaap
Hauke Smidt
Author Affiliations :
<relatesTo>1</relatesTo>TI Food and Nutrition (TIFN), Wageningen, 6703 HB, The Netherlands<br /><relatesTo>2</relatesTo>Laboratory of Microbiology, Wageningen University, Wageningen, 6708 WE, The Netherlands<br /><relatesTo>3</relatesTo>Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6708 WE, The Netherlands<br /><relatesTo>4</relatesTo>Aquaculture and Fisheries Group, Wageningen University, Wageningen, 6708 WD, The Netherlands
Source :
F1000Research. 5:1791
Publication Year :
2018
Publisher :
London, UK: F1000 Research Limited, 2018.

Abstract

Background: Massive high-throughput sequencing of short, hypervariable segments of the 16S ribosomal RNA (rRNA) gene has transformed the methodological landscape describing microbial diversity within and across complex biomes. However, several studies have shown that the methodology rather than the biological variation is responsible for the observed sample composition and distribution. This compromises meta-analyses, although this fact is often disregarded. Results: To facilitate true meta-analysis of microbiome studies, we developed NG-Tax, a pipeline for 16S rRNA gene amplicon sequence analysis that was validated with different mock communities and benchmarked against QIIME as a frequently used pipeline. The microbial composition of 49 independently amplified mock samples was characterized by sequencing two variable 16S rRNA gene regions, V4 and V5-V6, in three separate sequencing runs on Illumina’s HiSeq2000 platform. This allowed for the evaluation of important causes of technical bias in taxonomic classification: 1) run-to-run sequencing variation, 2) PCR–error, and 3) region/primer specific amplification bias. Despite the short read length (~140 nt) and all technical biases, the average specificity of the taxonomic assignment for the phylotypes included in the mock communities was 97.78%. On average 99.95% and 88.43% of the reads could be assigned to at least family or genus level, respectively, while assignment to ‘spurious genera’ represented on average only 0.21% of the reads per sample. Analysis of α- and β-diversity confirmed conclusions guided by biology rather than the aforementioned methodological aspects, which was not achieved with QIIME. Conclusions: Different biological outcomes are commonly observed due to 16S rRNA region-specific performance. NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition. This will improve comparability between studies and facilitate efforts towards standardization.

Details

ISSN :
20461402
Volume :
5
Database :
F1000Research
Journal :
F1000Research
Notes :
Revised Amendments from Version 1 In the new manuscript we substituted RDP for SILVA Incremental Aligner (SINA) to classify the full length sequences and we also updated the database in NG-Tax to SILVA 128, improving in both cases the classification. We substantially increased the amount and detail of information on the description of the general work flow. All the critical steps, including barcode and primer filtering, OTU picking, mapping rejected reads to accepted OTUs, de novo chimera filtering, taxonomic assignment and the generation of a phylogenic tree are now detailed in Figure 1 and explained in the user manual. In order to further improve interpretation, we have now added Table 1 to provide detailed information as to the number of misclassified reads at different taxonomic levels and Figure 5, which shows boxplots of the distances to the expected composition. We also performed statistical tests to quantitatively compare the performance of NG-Tax and QIIME. We performed a permanova analysis under MC type factor and it was significant for both pipelines meaning that some of the variance is explained by the Mock type. But to really evaluate accuracy and reproducibility and compare pipelines performances we used pairwise distances and t tests (Figure 7 and Dataset 1). As suggested by the reviewers we included the tables with the taxonomical profiles as supplementary data, which can be used for evaluation of the results. In an effort to increase comparability we performed an additional analysis using QIIME with a 0.1% abundance threshold and which is included in the supplementary data., , [version 2; referees: 2 approved, 1 approved with reservations, 1 not approved]
Publication Type :
Academic Journal
Accession number :
edsfor.10.12688.f1000research.9227.2
Document Type :
research-article
Full Text :
https://doi.org/10.12688/f1000research.9227.2