Back to Search
Start Over
Effect of k-tuple length on sample-comparison with high-throughput sequencing data.
- Source :
-
Biochemical & Biophysical Research Communications . Jan2016, Vol. 469 Issue 4, p1021-1027. 7p. - Publication Year :
- 2016
-
Abstract
- The high-throughput metagenomic sequencing offers a powerful technique to compare the microbial communities. Without requiring extra reference sequences, alignment-free models with short k- tuple ( k = 2–10 bp) yielded promising results. Short k- tuples describe the overall statistical distribution, but is hard to capture the specific characteristics inside one microbial community. Longer k- tuple contains more abundant information. However, because the frequency vector of long k- tuple( k ≥ 30 bp) is sparse, the statistical measures designed for short k- tuples are not applicable. In our study, we considered each tuple as a meaningful word and then each sequencing data as a document composed of the words. Therefore, the comparison between two sequencing data is processed as “topic analysis of documents” in text mining. We designed a pipeline with long k- tuple features to compare metagenomic samples combined using algorithms from text mining and pattern recognition. The pipeline is available at http://culotuple.codeplex.com/ . Experiments show that our pipeline with long k -tuple features: ①separates genomes with high similarity; ②outperforms short k -tuple models in all experiments. When k ≥ 12, the short k -tuple measures are not applicable anymore. When k is between 20 and 40, long k -tuple pipeline obtains much better grouping results; ③is free from the effect of sequencing platforms/protocols. ③We obtained meaningful and supported biological results on the 40-tuples selected for comparison. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0006291X
- Volume :
- 469
- Issue :
- 4
- Database :
- Academic Search Index
- Journal :
- Biochemical & Biophysical Research Communications
- Publication Type :
- Academic Journal
- Accession number :
- 112345607
- Full Text :
- https://doi.org/10.1016/j.bbrc.2015.11.094