Madeline H Kowalski, Huijun Qian, Ziyi Hou, Jonathan D Rosen, Amanda L Tapia, Yue Shan, Deepti Jain, Maria Argos, Donna K Arnett, Christy Avery, Kathleen C Barnes, Lewis C Becker, Stephanie A Bien, Joshua C Bis, John Blangero, Eric Boerwinkle, Donald W Bowden, Steve Buyske, Jianwen Cai, Michael H Cho, Seung Hoan Choi, Hélène Choquet, L Adrienne Cupples, Mary Cushman, Michelle Daya, Paul S de Vries, Patrick T Ellinor, Nauder Faraday, Myriam Fornage, Stacey Gabriel, Santhi K Ganesh, Misa Graff, Namrata Gupta, Jiang He, Susan R Heckbert, Bertha Hidalgo, Chani J Hodonsky, Marguerite R Irvin, Andrew D Johnson, Eric Jorgenson, Robert Kaplan, Sharon L R Kardia, Tanika N Kelly, Charles Kooperberg, Jessica A Lasky-Su, Ruth J F Loos, Steven A Lubitz, Rasika A Mathias, Caitlin P McHugh, Courtney Montgomery, Jee-Young Moon, Alanna C Morrison, Nicholette D Palmer, Nathan Pankratz, George J Papanicolaou, Juan M Peralta, Patricia A Peyser, Stephen S Rich, Jerome I Rotter, Edwin K Silverman, Jennifer A Smith, Nicholas L Smith, Kent D Taylor, Timothy A Thornton, Hemant K Tiwari, Russell P Tracy, Tao Wang, Scott T Weiss, Lu-Chen Weng, Kerri L Wiggins, James G Wilson, Lisa R Yanek, Sebastian Zöllner, Kari E North, Paul L Auer, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Hematology & Hemostasis Working Group, Laura M Raffield, Alexander P Reiner, and Yun Li
Most genome-wide association and fine-mapping studies to date have been conducted in individuals of European descent, and genetic studies of populations of Hispanic/Latino and African ancestry are limited. In addition, these populations have more complex linkage disequilibrium structure. In order to better define the genetic architecture of these understudied populations, we leveraged >100,000 phased sequences available from deep-coverage whole genome sequencing through the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) program to impute genotypes into admixed African and Hispanic/Latino samples with genome-wide genotyping array data. We demonstrated that using TOPMed sequencing data as the imputation reference panel improves genotype imputation quality in these populations, which subsequently enhanced gene-mapping power for complex traits. For rare variants with minor allele frequency (MAF) < 0.5%, we observed a 2.3- to 6.1-fold increase in the number of well-imputed variants, with 11-34% improvement in average imputation quality, compared to the state-of-the-art 1000 Genomes Project Phase 3 and Haplotype Reference Consortium reference panels. Impressively, even for extremely rare variants with minor allele count 86%. Subsequent association analyses of TOPMed reference panel-imputed genotype data with hematological traits (hemoglobin (HGB), hematocrit (HCT), and white blood cell count (WBC)) in ~21,600 African-ancestry and ~21,700 Hispanic/Latino individuals identified associations with two rare variants in the HBB gene (rs33930165 with higher WBC [p = 8.8x10-15] in African populations, rs11549407 with lower HGB [p = 1.5x10-12] and HCT [p = 8.8x10-10] in Hispanics/Latinos). By comparison, neither variant would have been genome-wide significant if either 1000 Genomes Project Phase 3 or Haplotype Reference Consortium reference panels had been used for imputation. Our findings highlight the utility of the TOPMed imputation reference panel for identification of novel rare variant associations not previously detected in similarly sized genome-wide studies of under-represented African and Hispanic/Latino populations.