Back to Search Start Over

Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.

Authors :
Kaisers W
Schwender H
Schaal H
Source :
International journal of molecular sciences [Int J Mol Sci] 2018 Nov 21; Vol. 19 (11). Date of Electronic Publication: 2018 Nov 21.
Publication Year :
2018

Abstract

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.

Details

Language :
English
ISSN :
1422-0067
Volume :
19
Issue :
11
Database :
MEDLINE
Journal :
International journal of molecular sciences
Publication Type :
Academic Journal
Accession number :
30469355
Full Text :
https://doi.org/10.3390/ijms19113687