Back to Search
Start Over
Comparison of phasing strategies for whole human genomes
- Source :
- PLoS Genetics, Vol 14, Iss 4, p e1007308 (2018), PLoS Genetics
- Publication Year :
- 2018
- Publisher :
- Public Library of Science (PLoS), 2018.
-
Abstract
- Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.<br />Author summary Humans are a diploid species that inherit one set of chromosomes paternally and one set of chromosomes maternally. Separating the nucleotide content of the maternally and paternally-derived chromosomes for an individual, i.e., ‘phasing’ that individual’s genome, is not trivial with today’s sequencing technologies. This is in part due to the fact that most available sequencing technologies generate short sequencing reads that make it hard to assemble individual homologous chromosome pairs. Phase information can be crucial for putting into context the likely functional consequences of DNA sequence variants as well as certain evolutionary and population genetics phenomena. In order to assess the reliability of current sequencing-based phasing strategies, we compared 11 different approaches using a public domain reference genome as a test case. These phasing strategies included laboratory-based experimental techniques as well as purely computational approaches. Importantly, our comparisons show that a hybrid or combined approach that leverages population-based phasing via the SHAPEIT software suite works well and can be improved with the addition of genome-wide sequence read or parental genotype data.
- Subjects :
- 0301 basic medicine
Male
Cancer Research
Heredity
Genome
Biochemistry
Nucleic Acids
Chromosomes, Human
Genome Sequencing
Genetics (clinical)
education.field_of_study
Shotgun sequencing
Genomics
Genetic Mapping
Female
Research Article
lcsh:QH426-470
Population
Nucleotide Sequencing
Variant Genotypes
Computational biology
Biology
Research and Analysis Methods
Polymorphism, Single Nucleotide
DNA sequencing
Human Genomics
03 medical and health sciences
Genomic Imprinting
Genetics
Humans
education
Molecular Biology Techniques
Sequencing Techniques
Gene
Molecular Biology
Ecology, Evolution, Behavior and Systematics
Comparative genomics
Sequence Assembly Tools
Genome, Human
Sequence Analysis, RNA
Haplotype
Biology and Life Sciences
Computational Biology
DNA
Sequence Analysis, DNA
Comparative Genomics
Genome Analysis
lcsh:Genetics
030104 developmental biology
Haplotypes
Human genome
Subjects
Details
- Language :
- English
- ISSN :
- 15537404 and 15537390
- Volume :
- 14
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- PLoS Genetics
- Accession number :
- edsair.doi.dedup.....4a29f4997539217eeda800965b7a606a