Back to Search Start Over

A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads

Authors :
Yumi Yamaguchi-Kabata
Takahiro Mimori
Masao Nagasaki
Kaname Kojima
Mamoru Takahashi
Naoki Nariai
Yukuto Sato
Source :
Bioinformatics (Oxford, England). 29(22)
Publication Year :
2013

Abstract

Motivation: Variant calling from genome-wide sequencing data is essential for the analysis of disease-causing mutations and elucidation of disease mechanisms. However, variant calling in low coverage regions is difficult due to sequence read errors and mapping errors. Hence, variant calling approaches that are robust to low coverage data are demanded. Results: We propose a new variant calling approach that considers pedigree information and haplotyping based on sequence reads spanning two or more heterozygous positions termed phase informative reads. In our approach, genotyping and haplotyping by the assignment of each read to a haplotype based on phase informative reads are simultaneously performed. Therefore, positions with low evidence for heterozygosity are rescued by phase informative reads, and such rescued positions contribute to haplotyping in a synergistic way. In addition, pedigree information supports more accurate haplotyping as well as genotyping, especially in low coverage regions. Although heterozygous positions are useful for haplotyping, homozygous positions are not informative and weaken the information from heterozygous positions, as majority of positions are homozygous. Thus, we introduce latent variables that determine zygosity at each position to filter out homozygous positions for haplotyping. In performance evaluation with a parent–offspring trio sequencing data, our approach outperforms existing approaches in accuracy on the agreement with single nucleotide polymorphism array genotyping results. Also, performance analysis considering distance between variants showed that the use of phase informative reads is effective for accurate variant calling, and further performance improvement is expected with longer sequencing data. Contact: nagasaki@megabank.tohoku.ac.jp or kojima@megabank.tohoku.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

Details

ISSN :
13674811
Volume :
29
Issue :
22
Database :
OpenAIRE
Journal :
Bioinformatics (Oxford, England)
Accession number :
edsair.doi.dedup.....64072f5f3266ad364ca39fb5bf727e8f