Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training - PubMed
- ️Mon Jan 01 2007
. 2007 Mar 1;66(4):838-45.
doi: 10.1002/prot.21298.
Affiliations
- PMID: 17177203
- DOI: 10.1002/prot.21298
Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training
Ofer Dor et al. Proteins. 2007.
Abstract
An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.
(c) 2006 Wiley-Liss, Inc.
Similar articles
-
Dor O, Zhou Y. Dor O, et al. Proteins. 2007 Jul 1;68(1):76-81. doi: 10.1002/prot.21408. Proteins. 2007. PMID: 17397056
-
Combining prediction of secondary structure and solvent accessibility in proteins.
Adamczak R, Porollo A, Meller J. Adamczak R, et al. Proteins. 2005 May 15;59(3):467-75. doi: 10.1002/prot.20441. Proteins. 2005. PMID: 15768403
-
Accurate prediction of solvent accessibility using neural networks-based regression.
Adamczak R, Porollo A, Meller J. Adamczak R, et al. Proteins. 2004 Sep 1;56(4):753-67. doi: 10.1002/prot.20176. Proteins. 2004. PMID: 15281128
-
Protein secondary structure prediction.
Pirovano W, Heringa J. Pirovano W, et al. Methods Mol Biol. 2010;609:327-48. doi: 10.1007/978-1-60327-241-4_19. Methods Mol Biol. 2010. PMID: 20221928 Review.
-
Review: protein secondary structure prediction continues to rise.
Rost B. Rost B. J Struct Biol. 2001 May-Jun;134(2-3):204-18. doi: 10.1006/jsbi.2001.4336. J Struct Biol. 2001. PMID: 11551180 Review.
Cited by
-
Improving computational protein design by using structure-derived sequence profile.
Dai L, Yang Y, Kim HR, Zhou Y. Dai L, et al. Proteins. 2010 Aug 1;78(10):2338-48. doi: 10.1002/prot.22746. Proteins. 2010. PMID: 20544969 Free PMC article.
-
A dynamic Bayesian network approach to protein secondary structure prediction.
Yao XQ, Zhu H, She ZS. Yao XQ, et al. BMC Bioinformatics. 2008 Jan 25;9:49. doi: 10.1186/1471-2105-9-49. BMC Bioinformatics. 2008. PMID: 18218144 Free PMC article.
-
Trends in template/fragment-free protein structure prediction.
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Zhou Y, et al. Theor Chem Acc. 2011 Jan;128(1):3-16. doi: 10.1007/s00214-010-0799-2. Epub 2010 Sep 1. Theor Chem Acc. 2011. PMID: 21423322 Free PMC article.
-
Zheng H, Wu H. Zheng H, et al. BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-11-S11-S7. BMC Bioinformatics. 2010. PMID: 21172057 Free PMC article.
-
Characterization and Prediction of Protein Flexibility Based on Structural Alphabets.
Dong Q, Wang K, Liu B, Liu X. Dong Q, et al. Biomed Res Int. 2016;2016:4628025. doi: 10.1155/2016/4628025. Epub 2016 Aug 30. Biomed Res Int. 2016. PMID: 27660756 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources