Back to Search
Start Over
Blind Prediction of Deleterious Amino Acid Variations with SNPs&GO
- Publication Year :
- 2017
-
Abstract
- SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a Support Vector Machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by Gene Ontology terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve (AUC) of 0.88 with low false positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper we summarize the best results obtained by SNPs&GO on disease related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013) and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs. Availability: SNPs&GO is accessible at http://snps.biofold.org/snps-and-go or http://snps-and-go.biocomp.unibo.it This article is protected by copyright. All rights reserved
- Subjects :
- 0301 basic medicine
Support Vector Machine
disease-related variation
gene ontology
genome interpretation
machine learning
protein function
single amino acid variation
variant annotation
Single-nucleotide polymorphism
Biology
Genome
Article
alpha-N-Acetylgalactosaminidase
03 medical and health sciences
Protein sequencing
Machine learning
Genetics
Humans
Genetic Predisposition to Disease
Cyclin-Dependent Kinase Inhibitor p16
Genetics (clinical)
Gene ontology
Computational Biology
Molecular Sequence Annotation
Acid Anhydride Hydrolases
DNA-Binding Proteins
Support vector machine
Checkpoint Kinase 2
DNA Repair Enzymes
030104 developmental biology
Amino Acid Substitution
ROC Curve
Binary classification
False positive rate
Algorithms
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....7d346069b7456c0c98fbe6d289d3f71d