pubmed.ncbi.nlm.nih.gov

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training - PubMed

  • ️Mon Jan 01 2007

. 2007 Mar 1;66(4):838-45.

doi: 10.1002/prot.21298.

Affiliations

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training

Ofer Dor et al. Proteins. 2007.

Abstract

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.

(c) 2006 Wiley-Liss, Inc.

PubMed Disclaimer

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources