1. Accurate Phasing of Pedigree Genotypes Using Whole Genome Sequence Data
- Author
-
Harald H H Göring, August Blackburn, Juan M. Peralta, John Blangero, Nicholas B. Blackburn, Donna M. Lehman, Lucy Blondell, Mark Z. Kos, and Stevens P
- Subjects
Genetics ,Whole genome sequencing ,0303 health sciences ,030305 genetics & heredity ,Haplotype ,Word error rate ,Pedigree chart ,Computational biology ,Biology ,03 medical and health sciences ,Chromosome (genetic algorithm) ,Genotype ,Allele ,Genotyping ,030304 developmental biology - Abstract
Phasing, the process of predicting haplotypes from genotype data, is an important undertaking in genetics and an ongoing area of research. Phasing methods, and associated software, designed specifically for pedigrees are urgently needed. Here we present a new method for phasing genotypes from whole genome sequencing data in pedigrees: PULSAR (Phasing Using Lineage Specific Alleles / Rare variants). The method is built upon the idea that alleles that are specific to a single founding chromosome within a pedigree, which we refer to as lineage-specific alleles, are highly informative for identifying haplotypes that are identical-by-decent between individuals within a pedigree. Through extensive simulation we assess the performance of PULSAR in a variety of pedigree sizes and structures, and we explore the effects of genotyping errors and presence of non-sequenced individuals on its performance. If the genotyping error rate is sufficiently low PULSAR can phase > 99.9% of heterozygous genotypes with a switch error rate below 1 x 10-4 in pedigrees where all individuals are sequenced. We demonstrate that the method is highly accurate and consistently outperforms the long-range phasing approach used for comparison in our benchmarking. The method also holds promise for fixing genotype errors or imputing missing genotypes. The software implementation of this method is freely available.
- Published
- 2017
- Full Text
- View/download PDF