Back to Search
Start Over
Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
- Source :
- BMC Bioinformatics, BMC bioinformatics, BMC bioinformatics, 17, 151
- Publication Year :
- 2015
-
Abstract
- Background Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Results Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Conclusions Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0999-4) contains supplementary material, which is available to authorized users.
- Subjects :
- 0301 basic medicine
Computational biology
Biology
Lineage tracing
Biochemistry
Polymerase Chain Reaction
DNA sequencing
Deep sequencing
03 medical and health sciences
Mice
Illumina
Structural Biology
Next generation sequencing
Animals
DNA Barcoding, Taxonomic
Molecular Biology
PCR error
Illumina dye sequencing
Sequencing error
Genetics
Biological data
Massive parallel sequencing
Base Sequence
Applied Mathematics
Stem Cells
Methodology Article
DNA sequencing theory
High-Throughput Nucleotide Sequencing
DNA
Sequence Analysis, DNA
Computer Science Applications
030104 developmental biology
Single cell sequencing
Cellular barcoding
DNA microarray
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 17
- Database :
- OpenAIRE
- Journal :
- BMC bioinformatics
- Accession number :
- edsair.doi.dedup.....d118d898ed625f4a3e8a32bf96cee7a4