Back to Search
Start Over
A crowdsourced set of curated structural variants for the human genome
- Source :
- PLoS Computational Biology, Vol 16, Iss 6, p e1007933 (2020), PLoS Computational Biology
- Publication Year :
- 2020
- Publisher :
- Public Library of Science (PLoS), 2020.
-
Abstract
- A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.<br />Author summary Large genomic changes, called structural variants, can cause a variety of human diseases, but have been challenging to detect with conventional DNA sequencing methods. We are working in the Genome in a Bottle Consortium to develop authoritatively characterized genomes with benchmark structural variants that can be used by anyone to assess the accuracy of their sequencing and analysis methods. Manual curation of the sequencing reads from multiple technologies has been essential to establish benchmark variant calls. Here, we present consensus curations from a web-based platform that displays a comprehensive set of visualizations of sequencing read support for structural variants. We use the svviz visualization tool to present evidence not only for deletions but also for insertions, which have previously not been possible to curate. We derive consensus calls from the multiple curations of each variant, and we find these are highly concordant with a draft Genome in a Bottle structural variant benchmark set.
- Subjects :
- 0301 basic medicine
Heredity
Computer science
Genome
Database and Informatics Methods
0302 clinical medicine
INDEL Mutation
Heuristics
Genome Sequencing
Biology (General)
Ecology
Genomics
Genetic Mapping
Tandem Repeats
Computational Theory and Mathematics
Modeling and Simulation
Sequence Analysis
Research Article
Bioinformatics
QH301-705.5
Concordance
Variant Genotypes
Computational biology
Research and Analysis Methods
Genome Complexity
DNA sequencing
Set (abstract data type)
03 medical and health sciences
Cellular and Molecular Neuroscience
Genetics
Humans
Repeated Sequences
Molecular Biology Techniques
Sequencing Techniques
Indel
Molecular Biology
Alleles
Ecology, Evolution, Behavior and Systematics
Genome, Human
Biology and Life Sciences
Computational Biology
Genome Analysis
030104 developmental biology
Haplotypes
Genetic Loci
Genomic Structural Variation
Human genome
Sequence Alignment
030217 neurology & neurosurgery
Reference genome
Subjects
Details
- Language :
- English
- ISSN :
- 1553734X
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology, Vol 16, Iss 6, p e1007933 (2020), PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....b519ca4c264c97671517aa93577cb7d9