pubmed.ncbi.nlm.nih.gov

MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins - PubMed

  • ️Sun Jan 01 2012

Comparative Study

MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins

Fatemeh Miri Disfani et al. Bioinformatics. 2012.

Abstract

Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains.

Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues.

Availability: http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.

Architecture of the MoRFpred method

Fig. 2.
Fig. 2.

Comparison of ROCs for MoRFpred and ANCHOR on the test dataset. The ROC curves are provided for the FPR < 0.1

Fig. 3.
Fig. 3.

Analysis of the top-ranked features that serve as sequence-derived markers of MoRFs. The average values of the top five ranked features used by MoRFpred, which are shown on the x-axis, for the native MoRF residues (light gray bars) and native non-MoRF residues (dark gray bars) are compared. The corresponding standard deviations are shown using the error bars. The selected five features represent an average difference of a given quantity (predicted disorder, stability or transfer energy). Negative values mean that average in the inner window of size w was higher than the average in the flanking segments

Fig. 4.
Fig. 4.

Prediction of MoRF residues for the Histone H2A protein by ANCHOR (blue lines), MoRFpred (orange lines), α-MoRF−PredI (thick red line) and α-MoRF-PredI I (thick green line) predictors. The x-axis shows positions in the protein sequence. Probability values are only available for ANCHOR and MoRFpred and are shown by thin blue and orange lines, respectively, at the top of the figure. The cutoff of 0.5 to convert probabilities into binary predictions for ANCHOR and MoRFpred is shown using a brown horizontal line. The native MoRF regions are annotated using black horizontal line. The binary predictions from ANCHOR, α-MoRF−PredI, α-MoRF-PredI I and MoRFpred are denoted using blue (at the −0.1 point on the y-axis), red (at the −0.2), green (at the −0.3) and orange (at the −0.4) horizontal lines. Lack of red and green lines means that α-MoRF−PredI and α-MoRF-PredII did not predict MoRFs

Similar articles

Cited by

References

    1. Altschul S., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Bastolla U., et al. Principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins. 2005;58:22–30. - PubMed
    1. Berman H., et al. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–D303. - PMC - PubMed
    1. Callaghan A.J., et al. Studies of the RNA degradosome-organizing domain of the Escherichia coli ribonucleaseRNase E. J. Mol. Biol. 2004;340:965–979. - PubMed
    1. Chen J.W., et al. Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. J. Proteome Res. 2006a;5:879–887. - PMC - PubMed

Publication types

MeSH terms

Substances