Back to Search Start Over

Analyzing Multi-locus Plant Barcoding Datasets with a Composition Vector Method Based on Adjustable Weighted Distance

Authors :
Guo-Sheng Han
Ka Hou Chu
Chi Pang Li
Zu-Guo Yu
Source :
PLoS ONE, PLoS ONE, Vol 7, Iss 7, p e42154 (2012)
Publication Year :
2012
Publisher :
Public Library of Science (PLoS), 2012.

Abstract

BackgroundThe composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.Methodology/principal findingsThree datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.ConclusionsWe conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.

Details

ISSN :
19326203
Volume :
7
Database :
OpenAIRE
Journal :
PLoS ONE
Accession number :
edsair.doi.dedup.....29bef61d6d188c67f7295b5b06bf3f4d