1. Manipulating base quality scores enables variant calling from bisulfite sequencing alignments using conventional bayesian approaches
- Author
-
Mario Fasold, David Langenberger, Christian Otto, Adam Nunn, and Peter F. Stadler
- Subjects
Epigenomics ,Genotype ,Computer science ,Bisulfite sequencing ,Bayesian probability ,SNP ,Computational biology ,chemistry.chemical_compound ,Genetics ,Humans ,Sulfites ,Epigenetics ,Allele ,Genotyping ,Genetic variant ,DNA methylation ,High-Throughput Nucleotide Sequencing ,Bayes Theorem ,Sequence Analysis, DNA ,Base (topology) ,Benchmarking ,chemistry ,Sequence Alignment ,DNA ,Biotechnology - Abstract
Background Calling germline SNP variants from bisulfite-converted sequencing data poses a challenge for conventional software, which have no inherent capability to dissociate true polymorphisms from artificial mutations induced by the chemical treatment. Nevertheless, SNP data is desirable both for genotyping and to understand the DNA methylome in the context of the genetic background. The confounding effect of bisulfite conversion however can be conceptually resolved by observing differences in allele counts on a per-strand basis, whereby artificial mutations are reflected by non-complementary base pairs. Results Herein, we present a computational pre-processing approach for adapting sequence alignment data, thus indirectly enabling downstream analysis on a per-strand basis using conventional variant calling software such as GATK or Freebayes. In comparison to specialised tools, the method represents a marked improvement in precision-sensitivity based on high-quality, published benchmark datasets for both human and model plant variants. Conclusion The presented “double-masking” procedure represents an open source, easy-to-use method to facilitate accurate variant calling using conventional software, thus negating any dependency on specialised tools and mitigating the need to generate additional, conventional sequencing libraries alongside bisulfite sequencing experiments. The method is available at https://github.com/bio15anu/revelioand an implementation with Freebayes is available at https://github.com/EpiDiverse/SNP
- Published
- 2022