1. Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification
- Author
-
Amy L. Williams, Ramya Babu, John Blangero, Ravindranath Duggirala, Daniel N. Seidman, Joanne E. Curran, Sushila A. Shenoy, Donna M. Lehman, Minsoo Kim, Thomas D. Dyer, and Ian G. Woods
- Subjects
Computer science ,Mexican americans ,Polymorphism, Single Nucleotide ,Identity by descent ,Article ,03 medical and health sciences ,0302 clinical medicine ,Gene Frequency ,Chromosome (genetic algorithm) ,Genetics ,Humans ,Allele frequency ,Alleles ,Genetics (clinical) ,030304 developmental biology ,Ibis ,0303 health sciences ,Models, Genetic ,biology ,Genome, Human ,business.industry ,Pattern recognition ,biology.organism_classification ,Chromosomes, Human, Pair 2 ,Artificial intelligence ,Allele sharing ,business ,Sequence Analysis ,030217 neurology & neurosurgery - Abstract
Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.
- Published
- 2020