1. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
- Author
-
BEEHIVE Collaboration, Wymant, Chris, Blanquart, Francois, Golubchik, Tanya, Gall, Astrid, Bakker, Margreet, Bezemer, Daniela, Croucher, Nicholas J., Hall, Matthew, Hillebregt, Mariska, Ong, Swee Hoe, Ratmann, Oliver, Albert, Jan, Bannert, Norbert, Fellay, Jacques, Fransen, Katrien, Gourlay, Annabelle, Grabowski, M. Kate, Gunsenheimer-Bartmeyer, Barbara, Gunthard, Huldrych F., Kivelä, Pia, Kouyos, Roger, Laeyendecker, Oliver, Liitsola, Kirsi, Meyer, Laurence, Porter, Kholoud, Ristola, Matti, van Sighem, Ard, Berkhout, Ben, Cornelissen, Marion, Kellam, Paul, Reiss, Peter, Fraser, Christophe, Institute for Particle Physics Phenomenology (IPPP), Durham University, Centre interdisciplinaire de recherche en biologie (CIRB), Labex MemoLife, École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Collège de France (CdF (institution))-Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Infection, Anti-microbiens, Modélisation, Evolution (IAME (UMR_S_1137 / U1137)), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris 13 (UP13)-Université Paris Diderot - Paris 7 (UPD7)-Université Sorbonne Paris Cité (USPC), Stichting HIV Monitoring [Amsterdam], Universiteit van Amsterdam (UvA), Evolution and Ecology Research Center, University of New South Wales [Sydney] (UNSW), Department of Infectious Disease Epidemiology [London] (DIDE), Imperial College London, Ecole Polytechnique Fédérale de Lausanne (EPFL), University College of London [London] (UCL), Universität Zürich [Zürich] = University of Zurich (UZH), Department of Infectious Diseases and Hospital Epidemiology [Zurich], University hospital of Zurich [Zurich], Department of Medicine, The Johns Hopkins University School of Medicine-Division of Infectious Diseases, Centre de recherche en épidémiologie et santé des populations (CESP), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Université Paris-Sud - Paris 11 (UP11)-Hôpital Paul Brousse-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Versailles Saint-Quentin-en-Yvelines (UVSQ), Center for Infection and Immunity Amsterdam (CINIMA), Wellcome Trust Genome Campus, Structures et propriétés d'architectures moléculaire (SPRAM - UMR 5819), Institut Nanosciences et Cryogénie (INAC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut de Chimie du CNRS (INC)-Centre National de la Recherche Scientifique (CNRS), Big Data Institute, University of Oxford, École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL), University of Cambridge [UK] (CAM), Laboratory of Experimental Virology - Department of Medical Microbiology [Amsterdam, The Netherlands], Academic Medical Center - Academisch Medisch Centrum [Amsterdam] (AMC), University of Amsterdam [Amsterdam] (UvA)-University of Amsterdam [Amsterdam] (UvA)-Center for Infection and Immunity Amsterdam - CINIMA [Amsterdam, The Netherlands], The Wellcome Trust Sanger Institute [Cambridge], Karolinska Institutet [Stockholm], Robert Koch Institute [Berlin] (RKI), Johns Hopkins University (JHU), Helsinki University Hospital [Finland] (HUS), Division of Intramural Research [Bethesda, MD, USA] (Cardiovascular Branch), National Institutes of Health [Bethesda] (NIH)-National Heart, Lung, and Blood Institute [Bethesda] (NHLBI), Université de Versailles Saint-Quentin-en-Yvelines (UVSQ)-Université Paris-Sud - Paris 11 (UP11)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpital Paul Brousse-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Saclay, THe BEEHIVE Collaboration, European Project: 339251,EC:FP7:ERC,ERC-2013-ADG,BEEHIVE(2014), AII - Infectious diseases, Medical Microbiology, APH - Aging & Later Life, Infectious diseases, Global Health, Clinicum, Infektiosairauksien yksikkö, HUS Inflammation Center, HUS Internal Medicine and Rehabilitation, and Bill & Melinda Gates Foundation
- Subjects
0301 basic medicine ,PROTEASE ,Computer science ,Sequence assembly ,RECOMBINATION ,Computational biology ,Microbiology ,Genome ,DNA sequencing ,diversity ,Set (abstract data type) ,03 medical and health sciences ,Virology ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,ddc:610 ,mapping ,TYPE-1 ,ComputingMilieux_MISCELLANEOUS ,Sequence (medicine) ,Contig ,IDENTIFICATION ,[SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE] ,BEEHIVE Collaboration ,HIV ,INSERTIONS ,food and beverages ,bioinformatics ,TRANSFORM ,GENE ,Resources ,3. Good health ,Identification (information) ,ALIGNMENT ,030104 developmental biology ,3121 General medicine, internal medicine and other clinical medicine ,HUMAN-IMMUNODEFICIENCY-VIRUS ,[SDV.MP.VIR]Life Sciences [q-bio]/Microbiology and Parasitology/Virology ,genome assembly ,next-generation sequencing ,3111 Biomedicine ,610 Medizin und Gesundheit ,INHIBITORS ,Reference genome - Abstract
International audience; Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large betweenand within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
- Published
- 2018