1. Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations.
- Author
-
David, Gabriel, Bertolotti, Alicia, Layer, Ryan, Scofield, Douglas, Hayward, Alexander, Baril, Tobias, Burnett, Hamish A, Gudmunds, Erik, Jensen, Henrik, and Husby, Arild
- Subjects
- *
BIRD populations , *ENGLISH sparrow , *CONFIDENCE , *SINGLE nucleotide polymorphisms - Abstract
Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read–discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF