Back to Search
Start Over
HSEpred: predict half-sphere exposure from protein sequences
- Source :
- Monash University
- Publication Year :
- 2008
-
Abstract
- Motivation: Half-sphere exposure (HSE) is a newly developed two-dimensional solvent exposure measure. By conceptually separating an amino acid's sphere in a protein structure into two half spheres which represent its distinct spatial neighborhoods in the upward and downward directions, the HSE-up and HSE-down measures show superior performance compared with other measures such as accessible surface area, residue depth and contact number. However, currently there is no existing method for the prediction of HSE measures from sequence data. Results: In this article, we propose a novel approach to predict the HSE measures and infer residue contact numbers using the predicted HSE values, based on a well-prepared non-homologous protein structure dataset. In particular, we employ support vector regression (SVR) to quantify the relationship between HSE measures and protein sequences and evaluate its prediction performance. We extensively explore five sequence-encoding schemes to examine their effects on the prediction performance. Our method could achieve the correlation coefficients of 0.72 and 0.68 between the predicted and observed HSE-up and HSE-down measures, respectively. Moreover, contact number can be accurately predicted by the summation of the predicted HSE-up and HSE-down values, which has further enlarged the application of this method. The successful application of SVR approach in this study suggests that it should be more useful in quantifying the protein sequence–structure relationship and predicting the structural property profiles from protein sequences. Availability: The prediction webserver and supplementary materials are accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/hse/ Contact: sjn@kuicr.kyoto-u.ac.jp; takutsu@kuicr.kyoto-u.ac.jp Supplementary Information: Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Models, Molecular
Protein Conformation
Molecular Sequence Data
Biology
computer.software_genre
Biochemistry
Accessible surface area
Correlation
chemistry.chemical_compound
Protein structure
Data sequences
Sequence Analysis, Protein
Computer Simulation
Amino Acid Sequence
Contact number
Molecular Biology
business.industry
Proteins
Pattern recognition
Structural property
Computer Science Applications
Support vector machine
Computational Mathematics
Computational Theory and Mathematics
chemistry
Models, Chemical
Artificial intelligence
Data mining
Solvent exposure
business
computer
Algorithms
Software
Subjects
Details
- ISSN :
- 13674811
- Volume :
- 24
- Issue :
- 13
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....016bc991187a48d278b509466e7a9d57