Back to Search Start Over

Effect of k-tuple length on sample-comparison with high-throughput sequencing data.

Authors :
Wang, Ying
Lei, Xiaoye
Wang, Shun
Wang, Zicheng
Song, Nianfeng
Zeng, Feng
Chen, Ting
Source :
Biochemical & Biophysical Research Communications. Jan2016, Vol. 469 Issue 4, p1021-1027. 7p.
Publication Year :
2016

Abstract

The high-throughput metagenomic sequencing offers a powerful technique to compare the microbial communities. Without requiring extra reference sequences, alignment-free models with short k- tuple ( k = 2–10 bp) yielded promising results. Short k- tuples describe the overall statistical distribution, but is hard to capture the specific characteristics inside one microbial community. Longer k- tuple contains more abundant information. However, because the frequency vector of long k- tuple( k ≥ 30 bp) is sparse, the statistical measures designed for short k- tuples are not applicable. In our study, we considered each tuple as a meaningful word and then each sequencing data as a document composed of the words. Therefore, the comparison between two sequencing data is processed as “topic analysis of documents” in text mining. We designed a pipeline with long k- tuple features to compare metagenomic samples combined using algorithms from text mining and pattern recognition. The pipeline is available at http://culotuple.codeplex.com/ . Experiments show that our pipeline with long k -tuple features: ①separates genomes with high similarity; ②outperforms short k -tuple models in all experiments. When k ≥ 12, the short k -tuple measures are not applicable anymore. When k is between 20 and 40, long k -tuple pipeline obtains much better grouping results; ③is free from the effect of sequencing platforms/protocols. ③We obtained meaningful and supported biological results on the 40-tuples selected for comparison. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0006291X
Volume :
469
Issue :
4
Database :
Academic Search Index
Journal :
Biochemical & Biophysical Research Communications
Publication Type :
Academic Journal
Accession number :
112345607
Full Text :
https://doi.org/10.1016/j.bbrc.2015.11.094