Back to Search
Start Over
Identification of pathogenic variant enriched regions across genes and gene families
- Source :
- Genome Res
- Publication Year :
- 2019
-
Abstract
- Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2,871 gene family protein sequence alignments involving 9,990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 65,034 missense variants from patients. With this gene family approach, we identified 398 regions enriched for patient variants spanning 33,887 amino acids in 1,058 genes. As a comparison, testing the same genes individually we identified less patient variant enriched regions involving only 2,167 amino acids and 180 genes. Next, we selected de novo variants from 6,753 patients with neurodevelopmental disorders and 1,911 unaffected siblings, and observed a 5.56-fold enrichment of patient variants in our identified regions (95% C.I. =2.76-Inf, p-value = 6.66×10−8). Using an independent ClinVar variant set, we found missense variants inside the identified regions are 111-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 111.48, 95% C.I = 68.09-195.58, p-value < 2.2e−16). All patient variant enriched regions identified (PERs) are available online through a user-friendly platform for interactive data mining, visualization and download at http://per.broadinstitute.org. In summary, our gene family burden analysis approach identified novel patient variant enriched regions in protein sequences. This annotation can empower variant interpretation.
- Subjects :
- Male
PROTEIN
Method
Genome-wide association study
medicine.disease_cause
ANNOTATION
User-Computer Interface
0302 clinical medicine
Protein sequencing
Missense variants
SEQUENCE VARIANTS
Missense mutation
Genetics (clinical)
Genetics
chemistry.chemical_classification
Mutation
0303 health sciences
education.field_of_study
Protein function
318 Medical biotechnology
1184 Genetics, developmental biology, physiology
Chromosome Mapping
3. Good health
Amino acid
Multigene Family
Variant classification
Identification (biology)
Female
Genetics & genetic processes [F10] [Life sciences]
Génétique & processus génétiques [F10] [Sciences du vivant]
Gene families
DATABASE
Population
Mutation, Missense
Biology
03 medical and health sciences
Genetic variation
medicine
Gene family
Humans
Genetic Predisposition to Disease
Amino Acid Sequence
paralogs
education
COMMON
Gene
Alleles
030304 developmental biology
CONSEQUENCES
Computational Biology
Genetic Variation
chemistry
Amino Acid Substitution
1182 Biochemistry, cell and molecular biology
030217 neurology & neurosurgery
Software
Genome-Wide Association Study
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Genome Res
- Accession number :
- edsair.doi.dedup.....8049e908ac264cfed5c976344aeeaecb