Back to Search Start Over

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training.

Authors :
Dor, Ofer
Zhou, Yaoqi
Source :
Proteins; Mar2007, Vol. 66 Issue 4, p838-845, 8p
Publication Year :
2007

Abstract

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q<subscript>3</subscript> (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained. Proteins 2007. © 2006 Wiley-Liss, Inc. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08873585
Volume :
66
Issue :
4
Database :
Complementary Index
Journal :
Proteins
Publication Type :
Academic Journal
Accession number :
64228633
Full Text :
https://doi.org/10.1002/prot.21298