Back to Search Start Over

Towards complete and error-free genome assemblies of all vertebrate species

Authors :
Richard Hall
Tandy Warnow
Tanya M. Lama
Oliver A. Ryder
David Haussler
Matthew T. Biegler
Klaus-Peter Koepfli
Ivo Gut
Paul Flicek
Mark Chaisson
James Torrance
Guojie Zhang
Andrew J. Crawford
Federica Di Palma
Michael Hiller
Jennifer A. Marshall Graves
Sadye Paez
Sarah E. London
Mark Wilkinson
Kateryna D. Makova
Byung June Ko
Jimin George
Farooq O. Al-Ajli
Emma C. Teeling
George F. Turner
Robert H. S. Kraus
Sonja C. Vernes
Zev N. Kronenberg
Michelle Smith
Jonas Korlach
Daryl Eason
Jonathan Wood
Simona Secomandi
Claudio V. Mello
Arkarachai Fungtammasan
Arang Rhie
Tomas Marques-Bonet
Benedict Paten
Ekaterina Osipova
Richard Durbin
M. Thomas P. Gilbert
Beth Shapiro
Ivan Sović
Bruce C. Robertson
Richard E. Green
Eugene W. Myers
Leanne Haggerty
Sergey Koren
Martin Pippel
Bettina Haase
Patrick Masterson
Jay Ghurye
Maria Simbirsky
Samantha R. Friedrich
Chul Hee Lee
Luis R Nassar
Lindsey J. Cantin
Kerstin Howe
Erich D. Jarvis
Marlys L. Houck
Jason T. Howard
Jacquelyn Mountcastle
Mark Mooney
Paolo Franchini
Giulio Formenti
Siddarth Selvaraj
Robel E. Dagnew
Brett T. Hannigan
Brian P. Walenz
Alan Tracey
Heebal Kim
Constantina Theofanopoulou
Nicholas H. Putnam
Karen Clark
Iliana Bista
H. William Detrich
Dengfeng Guan
David Iorns
Andrew Digby
Trevor Pesout
Zemin Ning
Gregory Gedman
Woori Kwak
Maximilian Wagner
Joanna Collins
Harris A. Lewin
Hannes Svardal
Milan Malinsky
Byrappa Venkatesh
Françoise Thibaud-Nissen
Joana Damas
Andreas F. Kautt
Olivier Fedrigo
Christopher Dunn
William Chow
Warren E. Johnson
Yang Zhou
Adam M. Phillippy
Taylor Edwards
Paul Medvedev
Peter V. Lovell
Joyce V. Lee
Sylke Winkler
Stephen J. O'Brien
Wesley C. Warren
Alex Hastie
Marcela Uliano-Silva
Kevin L. Howe
Sarah B. Kingan
Fergal J. Martin
Christopher N. Balakrishnan
David F. Clayton
Ying Sims
Robert W. Murphy
Axel Meyer
Dave W Burt
Shane A. McCarthy
Sarah Pelan
Erik Garrison
Mark Diekhans
Frank Grützner
Gavin J. P. Naylor
Robert S. Harris
Hiram Clawson
Jinna Hoffman
Ann C Misuraca
J. H. Kim
University of St Andrews. School of Biology
University of St Andrews. St Andrews Bioinformatics Unit
Rhie, Arang [0000-0002-9809-8127]
Fedrigo, Olivier [0000-0002-6450-7551]
Formenti, Giulio [0000-0002-7554-5991]
Koren, Sergey [0000-0002-1472-8962]
Uliano-Silva, Marcela [0000-0001-6723-4715]
Thibaud-Nissen, Francoise [0000-0003-4957-7807]
Mountcastle, Jacquelyn [0000-0003-1078-4905]
Winkler, Sylke [0000-0002-0915-3316]
Vernes, Sonja C. [0000-0003-0305-4584]
Grutzner, Frank [0000-0002-3088-7314]
Balakrishnan, Christopher N. [0000-0002-0788-0659]
Burt, Dave [0000-0002-9991-1028]
George, Julia M. [0000-0001-6194-6914]
Digby, Andrew [0000-0002-1870-8811]
Robertson, Bruce [0000-0002-5348-2731]
Edwards, Taylor [0000-0002-7235-6175]
Meyer, Axel [0000-0002-0888-8193]
Kautt, Andreas F. [0000-0001-7792-0735]
Franchini, Paolo [0000-0002-8184-1463]
Detrich, H. William, III [0000-0002-0783-4505]
Pippel, Martin [0000-0002-8134-5929]
Malinsky, Milan [0000-0002-1462-6317]
Kingan, Sarah B. [0000-0002-4900-0189]
Hall, Richard [0000-0001-6490-8227]
Dunn, Christopher [0000-0002-0601-3254]
Lee, Joyce [0000-0002-3492-1102]
Putnam, Nicholas H. [0000-0002-1315-782X]
Gut, Ivo [0000-0001-7219-632X]
Tracey, Alan [0000-0002-4805-9058]
Guan, Dengfeng [0000-0002-6376-3940]
London, Sarah E. [0000-0002-7839-2644]
Clayton, David F. [0000-0002-6395-3488]
Mello, Claudio V. [0000-0002-9826-8421]
Friedrich, Samantha R. [0000-0003-0570-6080]
Osipova, Ekaterina [0000-0002-6769-7223]
Al-Ajli, Farooq O. [0000-0002-4692-7106]
Secomandi, Simona [0000-0001-8597-6034]
Kim, Heebal [0000-0003-3064-1303]
Theofanopoulou, Constantina [0000-0003-2014-7563]
Zhou, Yang [0000-0003-1247-5049]
Martin, Fergal [0000-0002-1672-050X]
Flicek, Paul [0000-0002-3897-7955]
Walenz, Brian P. [0000-0001-8431-1428]
Diekhans, Mark [0000-0002-0430-0989]
Paten, Benedict [0000-0001-8863-3539]
Crawford, Andrew J. [0000-0003-3153-6898]
Gilbert, M. Thomas P. [0000-0002-5805-7195]
Zhang, Guojie [0000-0001-6860-1521]
Venkatesh, Byrappa [0000-0003-3620-0277]
Shapiro, Beth [0000-0002-2733-7776]
Johnson, Warren E. [0000-0002-5954-186X]
Marques-Bonet, Tomas [0000-0002-5597-3075]
Teeling, Emma C. [0000-0002-3309-1346]
Ryder, Oliver A. [0000-0003-2427-763X]
Haussler, David [0000-0003-1533-4575]
Korlach, Jonas [0000-0003-3047-4250]
Lewin, Harris A. [0000-0002-1043-7287]
Howe, Kerstin [0000-0003-2237-513X]
Myers, Eugene W. [0000-0002-6580-7839]
Durbin, Richard [0000-0002-9130-1006]
Phillippy, Adam M. [0000-0003-2983-8934]
Jarvis, Erich D. [0000-0001-8931-5049]
Apollo - University of Cambridge Repository
National Institutes of Health (US)
National Human Genome Research Institute (US)
Ministry of Health and Welfare (South Korea)
Wellcome Trust
European Molecular Biology Laboratory
Howard Hughes Medical Institute
Rockefeller University
Robert and Rosabel Osborne Endowment
European Commission
National Library of Medicine (US)
Korea Institute of Marine Science & Technology
Ministry of Oceans and Fisheries (South Korea)
Alfred P. Sloan Foundation
Max Planck Society
Maine Department of Inland Fisheries & Wildlife
National Science Foundation (US)
University of Queensland
Science Exchange
Northeastern University (US)
Federal Ministry of Education and Research (Germany)
EMBO
National Key Research and Development Program (China)
Qatar Society of Al-Gannas (Algannas)
Katara Cultural Village
Government of Qatar
Monash University Malaysia
Hessen State Ministry of Higher Education, Research and the Arts
Ministry of Science, Research and Art Baden-Württemberg
Agency for Science, Technology and Research A*STAR (Singapore)
European Research Council
Ministerio de Ciencia, Innovación y Universidades (España)
Fundación 'la Caixa'
Generalitat de Catalunya
Irish Research Council
Danish National Research Foundation
Australian Research Council
Vernes, Sonja C [0000-0003-0305-4584]
Balakrishnan, Christopher N [0000-0002-0788-0659]
George, Julia M [0000-0001-6194-6914]
Kautt, Andreas F [0000-0001-7792-0735]
Detrich, H William [0000-0002-0783-4505]
Kingan, Sarah B [0000-0002-4900-0189]
Putnam, Nicholas H [0000-0002-1315-782X]
London, Sarah E [0000-0002-7839-2644]
Clayton, David F [0000-0002-6395-3488]
Mello, Claudio V [0000-0002-9826-8421]
Friedrich, Samantha R [0000-0003-0570-6080]
Al-Ajli, Farooq O [0000-0002-4692-7106]
Walenz, Brian P [0000-0001-8431-1428]
Crawford, Andrew J [0000-0003-3153-6898]
Gilbert, M Thomas P [0000-0002-5805-7195]
Johnson, Warren E [0000-0002-5954-186X]
Teeling, Emma C [0000-0002-3309-1346]
Ryder, Oliver A [0000-0003-2427-763X]
Lewin, Harris A [0000-0002-1043-7287]
Myers, Eugene W [0000-0002-6580-7839]
Phillippy, Adam M [0000-0003-2983-8934]
Jarvis, Erich D [0000-0001-8931-5049]
Source :
Nature, 592, 737-746, Nature, Rhie, A, McCarthy, S A, Fedrigo, O, Damas, J, Formenti, G, Koren, S, Uliano-Silva, M, Chow, W, Fungtammasan, A, Kim, J, Lee, C, Ko, B J, Chaisson, M, Gedman, G L, Cantin, L J, Thibaud-Nissen, F, Haggerty, L, Bista, I, Smith, M, Haase, B, Mountcastle, J, Winkler, S, Paez, S, Howard, J, Vernes, S C, Lama, T M, Grutzner, F, Warren, W C, Balakrishnan, C N, Burt, D, George, J M, Biegler, M T, Iorns, D, Digby, A, Eason, D, Robertson, B, Edwards, T, Wilkinson, M, Turner, G, Meyer, A, Kautt, A F, Franchini, P, Detrich, H W, Svardal, H, Wagner, M, Naylor, G J P, Pippel, M, Malinsky, M, Mooney, M, Simbirsky, M, Hannigan, B T, Pesout, T, Houck, M, Misuraca, A, Kingan, S B, Hall, R, Kronenberg, Z, Sović, I, Dunn, C, Ning, Z, Hastie, A, Lee, J, Selvaraj, S, Green, R E, Putnam, N H, Gut, I, Ghurye, J, Garrison, E, Sims, Y, Collins, J, Pelan, S, Torrance, J, Tracey, A, Wood, J, Dagnew, R E, Guan, D, London, S E, Clayton, D F, Mello, C V, Friedrich, S R, Lovell, P V, Osipova, E, Al-Ajli, F O, Secomandi, S, Kim, H, Theofanopoulou, C, Hiller, M, Zhou, Y, Harris, R S, Makova, K D, Medvedev, P, Hoffman, J, Masterson, P, Clark, K, Martin, F, Howe, K, Flicek, P, Walenz, B P, Kwak, W, Clawson, H, Diekhans, M, Nassar, L, Paten, B, Kraus, R H S, Crawford, A J, Gilbert, M T P, Zhang, G, Venkatesh, B, Murphy, R W, Koepfli, K-P, Shapiro, B, Johnson, W E, Di Palma, F, Marques-Bonet, T, Teeling, E C, Warnow, T, Graves, J M, Ryder, O A, Haussler, D, O’Brien, S J, Korlach, J, Lewin, H A, Howe, K, Myers, E W, Durbin, R, Phillippy, A M & Jarvis, E D 2021, ' Towards complete and error-free genome assemblies of all vertebrate species ', Nature, vol. 592, no. 7856, pp. 737-746 . https://doi.org/10.1038/s41586-021-03451-0, Natureevents directory, 592(7856):737-746, Digital.CSIC. Repositorio Institucional del CSIC, instname, Dipòsit Digital de Documents de la UAB, Universitat Autònoma de Barcelona, Nature, 592, 7856, pp. 737-746
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1,2,3,4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.<br />We thank them for their permission to publish. A.R., S.K., B.P.W. and A.M.P. were supported by the Intramural Research Program of the NHGRI, NIH (1ZIAHG200398). A.R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). S.A.M., I.B. and R.D. were supported by Wellcome Trust grant WT207492; W.C., M. Smith, Z.N., Y.S., J.C., S. Pelan, J.T., A.T., J.W. and Kerstin Howe by WT206194; L.H., F.M., Kevin Howe and P. Flicek by WT108749/Z/15/Z, WT218328/B/19/Z and the European Molecular Biology Laboratory. O.F. and E.D.J. were supported by Howard Hughes Medical Institute and Rockefeller University start-up funds for this project. J.D. and H.A.L. were supported by the Robert and Rosabel Osborne Endowment. M.U.-S. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement (750747). F.T.-N., J. Hoffman, P. Masterson and K.C. were supported by the Intramural Research Program of the NLM, NIH. C.L., B.J.K., J. Kim and H.K. were supported by the Marine Biotechnology Program of KIMST, funded by the Ministry of Ocean and Fisheries, Republic of Korea (20180430). M.C. was supported by Sloan Research Fellowship (FG-2020-12932). S.C.V. was funded by a Max Planck Research Group award from the Max Planck Society, and a Human Frontiers Science Program (HFSP) Research grant (RGP0058/2016). T.M.L., W.E.J. and the Canada lynx genome were funded by the Maine Department of Inland Fisheries & Wildlife (F11AF01099), including when W.E.J. held a National Research Council Research Associateship Award at the Walter Reed Army Institute of Research (WRAIR). C.B. was supported by the NSF (1457541 and 1456612). D.B. was funded by The University of Queensland (HFSP - RGP0030/2015). D.I. was supported by Science Exchange Inc. (Palo Alto, CA). H.W.D. was supported by NSF grants (OPP-0132032 ICEFISH 2004 Cruise, PLR-1444167 and OPP-1955368) and the Marine Science Center at Northeastern University (416). G.J.P.N. and the thorny skate genome were funded by Lenfest Ocean Program (30884). M.P. was funded by the German Federal Ministry of Education and Research (01IS18026C). M. Malinsky was supported by an EMBO fellowship (ALTF 456-2016). The following authors’ contributions were supported by the NIH: S. Selvaraj (R44HG008118); C.V.M., S.R.F., P.V.L. (R21 DC014432/DC/NIDCD); K.D.M. (R01GM130691); H.C. (5U41HG002371-19); M.D. (U41HG007234); and B.P. (R01HG010485). D.G. was supported by the National Key Research and Development Program of China (2017YFC1201201, 2018YFC0910504 and 2017YFC0907503). F.O.A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. C.T. was supported by The Rockefeller University. M. Hiller was supported by the LOEWE-Centre for Translational Biodiversity Genomics (TBG) funded by the Hessen State Ministry of Higher Education, Research and the Arts (HMWK). H.C. was supported by the NHGRI (5U41HG002371-19). R.H.S.K. was funded by the Max Planck Society with computational resources at the bwUniCluster and BinAC funded by the Ministry of Science, Research and the Arts Baden-Württemberg and the Universities of the State of Baden-Württemberg, Germany (bwHPC-C5). B.V. was supported by the Biomedical Research Council of A*STAR, Singapore. T.M.-B. was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (864203), MINECO/FEDER, UE (BFU2017-86471-P), Unidad de Excelencia María de Maeztu, AEI (CEX2018-000792-M), a Howard Hughes International Early Career award, Obra Social “La Caixa” and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). E.C.T. was supported by the European Research Council (ERC-2012-StG311000) and an Irish Research Council Laureate Award. M.T.P.G. was supported by an ERC Consolidator Award 681396-Extinction Genomics, and a Danish National Research Foundation Center Grant (DNRF143). T.W. was supported by the NSF (1458652). J. M. Graves was supported by the Australian Research Council (CEO561477). E.W.M. was partially supported by the German Federal Ministry of Education and Research (01IS18026C). Complementary sequencing support for the Anna’s hummingbird and several genomes was provided by Pacific Biosciences, Bionano Genomics, Dovetail Genomics, Arima Genomics, Phase Genomics, 10X Genomics, NRGene, Oxford Nanopore Technologies, Illumina, and DNAnexus. All other sequencing and assembly were conducted at the Rockefeller University, Sanger Institute, and Max Planck Institute Dresden genome labs. Part of this work used the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). We acknowledge funding from the Wellcome Trust (108749/Z/15/Z) and the European Molecular Biology Laboratory.<br />With funding from the Spanish government through the "Severo Ochoa Centre of Excellence" accreditation (CEX2018-000792-M).

Details

ISSN :
14764687 and 00280836
Volume :
592
Database :
OpenAIRE
Journal :
Nature
Accession number :
edsair.doi.dedup.....d50f2a74ee7e66ef185d231bcebbd6ea
Full Text :
https://doi.org/10.1038/s41586-021-03451-0