Back to Search Start Over

A crowdsourced set of curated structural variants for the human genome

Authors :
Eric T. Dawson
Chunlin Xiao
Noah Alexander
Sree Rohit Raj Kolora
Lesley M. Chapman
Aaron M. Wenger
Christopher M. Watson
Giuseppe Narzisi
Justin M. Zook
Daniel Blankenberg
Christian Brueffer
Graeme C. Smith
Marc L. Salit
Azza Ahmed
Paul Wolujewicz
Saadlee Shehreen
Naoki Nariai
Patrick Pai
Christos Proukakis
Andrew Carroll
Garan Jones
Wayne E. Clarke
Noah Spies
Chun Shen Lim
Chapman, Lesley M [0000-0001-7413-4392]
Spies, Noah [0000-0002-6759-9842]
Pai, Patrick [0000-0001-5304-788X]
Lim, Chun Shen [0000-0001-7015-0125]
Carroll, Andrew [0000-0002-4824-6689]
Narzisi, Giuseppe [0000-0003-1118-8849]
Watson, Christopher M [0000-0003-2371-1844]
Proukakis, Christos [0000-0001-6423-6539]
Clarke, Wayne E [0000-0003-2471-0712]
Dawson, Eric [0000-0001-5448-1653]
Jones, Garan [0000-0002-8917-3930]
Brueffer, Christian [0000-0002-3826-0989]
Kolora, Sree Rohit Raj [0000-0001-7839-735X]
Wolujewicz, Paul [0000-0003-2982-9448]
Ahmed, Azza E [0000-0002-1358-8371]
Smith, Graeme [0000-0002-7413-4998]
Shehreen, Saadlee [0000-0002-4869-0747]
Wenger, Aaron M [0000-0003-1183-0432]
Salit, Marc [0000-0003-1624-5195]
Apollo - University of Cambridge Repository
Source :
PLoS Computational Biology, Vol 16, Iss 6, p e1007933 (2020), PLoS Computational Biology
Publication Year :
2020
Publisher :
Public Library of Science (PLoS), 2020.

Abstract

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app—SVCurator—to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. ‘Expert’ curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of ‘expert’ curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.<br />Author summary Large genomic changes, called structural variants, can cause a variety of human diseases, but have been challenging to detect with conventional DNA sequencing methods. We are working in the Genome in a Bottle Consortium to develop authoritatively characterized genomes with benchmark structural variants that can be used by anyone to assess the accuracy of their sequencing and analysis methods. Manual curation of the sequencing reads from multiple technologies has been essential to establish benchmark variant calls. Here, we present consensus curations from a web-based platform that displays a comprehensive set of visualizations of sequencing read support for structural variants. We use the svviz visualization tool to present evidence not only for deletions but also for insertions, which have previously not been possible to curate. We derive consensus calls from the multiple curations of each variant, and we find these are highly concordant with a draft Genome in a Bottle structural variant benchmark set.

Details

Language :
English
ISSN :
1553734X
Database :
OpenAIRE
Journal :
PLoS Computational Biology, Vol 16, Iss 6, p e1007933 (2020), PLoS Computational Biology
Accession number :
edsair.doi.dedup.....b519ca4c264c97671517aa93577cb7d9