Anne Puel, Stephanie Boisson-Dupuis, Davood Mansouri, Avinash Abhyankar, Shen-Ying Zhang, Silvia SÁNCHEZ-RAMÓN, Jean-Laurent Casanova, Jacinta Bustamante, Antonio Condino-Neto, Francois Vandenesch, Bertrand Boisson, Vincent Pedergnana, Sara sebnem Kilic, Emmanuelle Jouanguy, Aurélie Cobat, Didier Raoult, Melike Emiroglu, CHU Necker - Enfants Malades [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP), Imagine - Institut des maladies génétiques (IHU) (Imagine - U1163), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Paris Cité (UPC), The Wellcome Trust Centre for Human Genetics [Oxford], University of Oxford [Oxford], St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller University [New York], New York Genome Center [New York], New York Genome Center, Hôpital militaire d'instruction Mohammed V [Rabat, Maroc], CHU Ibn Rochd [Casablanca], Université Hassan II [Casablanca] (UH2MC), Howard Hughes Medical Institute [New York] (HHMI), Howard Hughes Medical Institute (HHMI)-New York University School of Medicine, NYU System (NYU)-NYU System (NYU)-Rockefeller University [New York]-Columbia University Irving Medical Center (CUIMC), and Exome/Array Consortium: Waleed Al-Herz, Cigdem Arikan, Peter Arkwright, Cigdem Aydogmus, Olivier Bernard, Lizbeth Blancas-Galicia, Stéphanie Boisson-Dupuis, Damien Bonnet, Omar Boudghene Stambouli, Lobna Boussofara, Jeannette Boutros, Jacinta Bustamante, Michael Ciancanelli, Theresa Cole, Antonio Condino-Neto, Mukesh Desai, Claire Fieschi, José Luis Franco, Philippe Ichai, Emmanuelle Jouanguy, Melike Keser-Emiroglu, Sara S Kilic, Seyed Alireza Mahdaviani, Nizar Mahlhoui, Davood Mansouri, Nima Parvaneh, Capucine Picard, Anne Puel, Didier Raoult, Nima Rezaei, Ozden Sanal, Silvia Sanchez Ramon, François Vandenesch, Guillaume Vogt, Shen-Ying Zhang
International audience; Significance We compared the information provided by whole-exome sequencing (WES) and genome-wide single-nucleotide variant arrays in terms of principal component analysis, homozygosity rate estimation, and linkage analysis using 110 subjects originating from different regions of the world. WES provided an accurate prediction of population substructure using high-quality variants with a minor allele frequency > 2% and reliable estimation of homozygosity rates using runs of homozygosity. Finally, homozygosity mapping in 15 consanguineous families showed that WES led to powerful linkage analyses, particularly in coding regions. Overall, our study shows that WES could be used for several analyses that are very helpful to optimize the search for disease-causing exome variants. AbstractPrincipal component analysis (PCA), homozygosity rate estimations, and linkage studies in humans are classically conducted through genome-wide single-nucleotide variant arrays (GWSA). We compared whole-exome sequencing (WES) and GWSA for this purpose. We analyzed 110 subjects originating from different regions of the world, including North Africa and the Middle East, which are poorly covered by public databases and have high consanguinity rates. We tested and applied a number of quality control (QC) filters. Compared with GWSA, we found that WES provided an accurate prediction of population substructure using variants with a minor allele frequency > 2% (correlation = 0.89 with the PCA coordinates obtained by GWSA). WES also yielded highly reliable estimates of homozygosity rates using runs of homozygosity with a 1,000-kb window (correlation = 0.94 with the estimates provided by GWSA). Finally, homozygosity mapping analyses in 15 families including a single offspring with high homozygosity rates showed that WES provided 51% less genome-wide linkage information than GWSA overall but 97% more information for the coding regions. At the genome-wide scale, 76.3% of linked regions were found by both GWSA and WES, 17.7% were found by GWSA only, and 6.0% were found by WES only. For coding regions, the corresponding percentages were 83.5%, 7.4%, and 9.1%, respectively. With appropriate QC filters, WES can be used for PCA and adjustment for population substructure, estimating homozygosity rates in individuals, and powerful linkage analyses, particularly in coding regions.