Back to Search Start Over

Towards complete and error-free genome assemblies of all vertebrate species

Authors :
Rhie, Arang
McCarthy, Shane A.
Fedrigo, Olivier
Damas, Joana
Formenti, Giulio
Koren, Sergey
Uliano-Silva, Marcela
Chow, William
Fungtammasan, Arkarachai
Kim, Juwan
Lee, Chul
Ko, Byung June
Chaisson, Mark
Gedman, Gregory L.
Cantin, Lindsey J.
Thibaud-Nissen, Francoise
Haggerty, Leanne
Bista, Iliana
Smith, Michelle
Haase, Bettina
Mountcastle, Jacquelyn
Winkler, Sylke
Paez, Sadye
Howard, Jason
Vernes, Sonja C.
Lama, Tanya M.
Grutzner, Frank
Warren, Wesley C.
Balakrishnan, Christopher N.
Burt, Dave
George, Julia M.
Biegler, Matthew T.
Iorns, David
Digby, Andrew
Eason, Daryl
Robertson, Bruce
Edwards, Taylor
Wilkinson, Mark
Turner, George
Meyer, Axel
Kautt, Andreas F.
Franchini, Paolo
Detrich, H. William
Svardal, Hannes
Wagner, Maximilian
Naylor, Gavin J. P.
Pippel, Martin
Malinsky, Milan
Mooney, Mark
Simbirsky, Maria
Hannigan, Brett T.
Pesout, Trevor
Houck, Marlys
Misuraca, Ann
Kingan, Sarah B.
Hall, Richard
Kronenberg, Zev
Sović, Ivan
Dunn, Christopher
Ning, Zemin
Hastie, Alex
Lee, Joyce
Selvaraj, Siddarth
Green, Richard E.
Putnam, Nicholas H.
Gut, Ivo
Ghurye, Jay
Garrison, Erik
Sims, Ying
Collins, Joanna
Pelan, Sarah
Torrance, James
Tracey, Alan
Wood, Jonathan
Dagnew, Robel E.
Guan, Dengfeng
London, Sarah E.
Clayton, David F.
Mello, Claudio V.
Friedrich, Samantha R.
Lovell, Peter V.
Osipova, Ekaterina
Al-Ajli, Farooq O.
Secomandi, Simona
Kim, Heebal
Theofanopoulou, Constantina
Hiller, Michael
Zhou, Yang
Harris, Robert S.
Makova, Kateryna D.
Medvedev, Paul
Hoffman, Jinna
Masterson, Patrick
Clark, Karen
Martin, Fergal
Howe, Kevin
Flicek, Paul
Walenz, Brian P.
Kwak, Woori
Clawson, Hiram
Diekhans, Mark
Nassar, Luis
Paten, Benedict
Kraus, Robert H. S.
Crawford, Andrew J.
Gilbert, M. Thomas P.
Zhang, Guojie
Venkatesh, Byrappa
Murphy, Robert W.
Koepfli, Klaus-Peter
Shapiro, Beth
Johnson, Warren E.
Di Palma, Federica
Marques-Bonet, Tomas
Teeling, Emma C.
Warnow, Tandy
Graves, Jennifer Marshall
Ryder, Oliver A.
Haussler, David
O’Brien, Stephen J.
Korlach, Jonas
Lewin, Harris A.
Howe, Kerstin
Myers, Eugene W.
Durbin, Richard
Phillippy, Adam M.
Jarvis, Erich D.
Source :
Nature; April 2021, Vol. 592 Issue: 7856 p737-746, 10p
Publication Year :
2021

Abstract

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Details

Language :
English
ISSN :
00280836 and 14764687
Volume :
592
Issue :
7856
Database :
Supplemental Index
Journal :
Nature
Publication Type :
Periodical
Accession number :
ejs56021595
Full Text :
https://doi.org/10.1038/s41586-021-03451-0