Back to Search Start Over

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.

Authors :
Gustafson JA
Gibson SB
Damaraju N
Zalusky MPG
Hoekzema K
Twesigomwe D
Yang L
Snead AA
Richmond PA
De Coster W
Olson ND
Guarracino A
Li Q
Miller AL
Goffena J
Anderson ZB
Storz SHR
Ward SA
Sinha M
Gonzaga-Jauregui C
Clarke WE
Basile AO
Corvelo A
Reeves C
Helland A
Musunuri RL
Revsine M
Patterson KE
Paschal CR
Zakarian C
Goodwin S
Jensen TD
Robb E
McCombie WR
Sedlazeck FJ
Zook JM
Montgomery SB
Garrison E
Kolmogorov M
Schatz MC
McLaughlin RN Jr
Dashnow H
Zody MC
Loose M
Jain M
Eichler EE
Miller DE
Source :
Genome research [Genome Res] 2024 Nov 20; Vol. 34 (11), pp. 2061-2073. Date of Electronic Publication: 2024 Nov 20.
Publication Year :
2024

Abstract

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.<br /> (© 2024 Gustafson et al.; Published by Cold Spring Harbor Laboratory Press.)

Details

Language :
English
ISSN :
1549-5469
Volume :
34
Issue :
11
Database :
MEDLINE
Journal :
Genome research
Publication Type :
Academic Journal
Accession number :
39358015
Full Text :
https://doi.org/10.1101/gr.279273.124