1. V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data
- Author
-
Susana Posada-Cespedes, Ivan Topolsky, Karin J. Metzner, Kim Philipp Jablonski, Niko Beerenwinkel, David Seifert, University of Zurich, and Beerenwinkel, Niko
- Subjects
Statistics and Probability ,10028 Institute of Medical Virology ,1303 Biochemistry ,AcademicSubjects/SCI01060 ,Computer science ,Virulence ,Genomics ,610 Medicine & health ,Computational biology ,medicine.disease_cause ,Biochemistry ,Virus ,10234 Clinic for Infectious Diseases ,03 medical and health sciences ,medicine ,1312 Molecular Biology ,1706 Computer Science Applications ,2613 Statistics and Probability ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Genetic diversity ,Mutation ,business.industry ,030302 biochemistry & molecular biology ,Haplotype ,Modular design ,Pipeline (software) ,Original Papers ,Computer Science Applications ,Computational Mathematics ,Identification (information) ,Computational Theory and Mathematics ,Mutation (genetic algorithm) ,business ,2605 Computational Mathematics ,Sequence Analysis ,1703 Computational Theory and Mathematics - Abstract
Motivation High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. Results To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape., Bioinformatics, 37 (12), ISSN:1367-4803, ISSN:1460-2059
- Published
- 2021