Back to Search
Start Over
Hap-seq: an optimal algorithm for haplotype phasing with imputation using sequencing data.
- Source :
-
Journal of computational biology : a journal of computational molecular cell biology [J Comput Biol] 2013 Feb; Vol. 20 (2), pp. 80-92. - Publication Year :
- 2013
-
Abstract
- Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses, including admixture mapping, identifying regions of identity by descent, and imputation. Traditionally, haplotypes are inferred from genotype data obtained from microarrays using information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method that combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype, which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.
- Subjects :
- Chromosome Mapping methods
Computer Simulation
Databases, Genetic
Gene Frequency
Genetic Variation
HapMap Project
Humans
Likelihood Functions
Molecular Sequence Annotation methods
Algorithms
Alleles
Chromosome Mapping statistics & numerical data
Haplotypes
Molecular Sequence Annotation statistics & numerical data
Subjects
Details
- Language :
- English
- ISSN :
- 1557-8666
- Volume :
- 20
- Issue :
- 2
- Database :
- MEDLINE
- Journal :
- Journal of computational biology : a journal of computational molecular cell biology
- Publication Type :
- Academic Journal
- Accession number :
- 23383995
- Full Text :
- https://doi.org/10.1089/cmb.2012.0091