Back to Search
Start Over
Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE
- Source :
- The American Journal of Human Genetics. 105:78-88
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified many fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1(st) to 5(th) degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.
- Subjects :
- Male
Genetic Linkage
Computer science
Polymorphism, Single Nucleotide
Article
03 medical and health sciences
Quantitative Trait, Heritable
0302 clinical medicine
Software
Genetics
Humans
Computer Simulation
Genetic Predisposition to Disease
1000 Genomes Project
Germ-Line Mutation
Genetics (clinical)
Selection (genetic algorithm)
030304 developmental biology
Estimation
0303 health sciences
Truffle
Models, Genetic
Genome, Human
business.industry
Chromosome Mapping
Pattern recognition
Genomics
Phaser
Pedigree
Visualization
Identification (information)
Genetics, Population
Haplotypes
Female
Artificial intelligence
business
Algorithms
030217 neurology & neurosurgery
Genome-Wide Association Study
Subjects
Details
- ISSN :
- 00029297
- Volume :
- 105
- Database :
- OpenAIRE
- Journal :
- The American Journal of Human Genetics
- Accession number :
- edsair.doi.dedup.....95908c93b2cab92be26282afc28ae0d4
- Full Text :
- https://doi.org/10.1016/j.ajhg.2019.05.007