pubmed.ncbi.nlm.nih.gov

In-silico prediction of disorder content using hybrid sequence representation - PubMed

  • ️Sat Jan 01 2011

In-silico prediction of disorder content using hybrid sequence representation

Marcin J Mizianty et al. BMC Bioinformatics. 2011.

Abstract

Background: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content.

Results: We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content.

Conclusions: DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Architecture of the DisCon predictor.

Figure 2
Figure 2

The MCC values (y-axis) for the binary prediction of chains that are characterized by the amount of the disorder below/above a cut-off value shown on the x-axis. The binary predictions are computed by thresholding the predicted disorder content generated by DisCon and the 10 considered disorder predictors on the test dataset.

Figure 3
Figure 3

The MCC values for the residue-level disorder prediction adjusted using content predicted by DisCon. The bar chart includes the original predictions (densely dotted red bars), predictions with a fixed cut-off that is optimized to maximize MCC on the entire test dataset (sparsely dotted blue bars), predictions where the content predicted by DisCon is used to adjust the cut-off (solid black bars) and where the content predicted by MD if its values are > 0.65 or < 0.1 and otherwise content predicted by DisCon are used to adjust the cut-off (solid green bars). The results were computed on the test dataset and the methods on the x-axis are sorted by their original MCC values.

Figure 4
Figure 4

Prediction of disordered residues in the apoptosis-inducing ligand 2 (Apo2L) protein (PDB ID 1DG6 chain A) by Ucon (thin blue line), PONDR-FIT (thin red line), MD (thin green line), MFDp (thin gray line), IUPredS (thin pink line), and DISOPRED2 (thin cyan line) predictors. The original cut-offs are shown using dashed lines. The native disordered regions are annotated using black horizontal line. The original binary predictions from Ucon, PONDR-FIT, MD, MFDp, IUPredS, and DISOPRED2 are denoted using blue (at the -0.1 point on the y-axis), red (at the -0.2), green (at the -0.3), gray (at the -0.4), pink (at the -0.5), and cyan (at the -0.6) horizontal lines. The binary predictions that were adjusted to match content predicted with DisCon are shown using horizontal bright green lines located immediately under the lines that show the original predictions.

Figure 5
Figure 5

Prediction of disordered residues in the inosine-5'-monophosphate dehydrogenase protein (DisProt ID DP00399) by Ucon (thin blue line), PONDR-FIT (thin red line), MD (thin green line), MFDp (thin gray line), IUPredS (thin pink line), and DISOPRED2 (thin cyan line) predictors. The original cut-offs are shown using dashed lines. The native disordered regions are annotated using black horizontal line. The original binary predictions from Ucon, PONDR-FIT, MD, MFDp, IUPredS, and DISOPRED2 are denoted using blue (at the -0.1 point on the y-axis), red (at the -0.2), green (at the -0.3), gray (at the -0.4), pink (at the -0.5), and cyan (at the -0.6) horizontal lines. The binary predictions that were adjusted to match content predicted with DisCon are shown using horizontal bright green lines located immediately under the lines that show the original predictions.

Figure 6
Figure 6

Scatter plots of the relations between the values of selected four input features, SS_HE-DOM_in-BFNS_low-RSA_B (green circle markers), BFNS_low-Seg_10 (red triangle markers), SS_CH-BFNS_high-DOM_notin (black × markers), and CHC...CHSeg (blue hollow circle markers) shown on y-axis and the native disorder content given on x-axis. The lines correspond to linear regressions with the corresponding R2 values.

Similar articles

Cited by

References

    1. Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky V. The unfoldomics decade: an update on intrinsically disordered proteins. BMC Genomics. 2008;9(Suppl 2):S1. doi: 10.1186/1471-2164-9-S2-S1. - DOI - PMC - PubMed
    1. Uversky VN, Oldfield CJ, Midic U, Xie H, Vucetic S, Xue B, Iakoucheva LM, Obradovic Z, Dunker AK. Unfoldomics of human diseases: Linking protein intrinsic disorder with diseases. BMC Genomics. 2009;10(Suppl 1):S7. doi: 10.1186/1471-2164-10-S1-S7. - DOI - PMC - PubMed
    1. Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys. 2008;37:215–246. doi: 10.1146/annurev.biophys.37.032807.125924. - DOI - PubMed
    1. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol. 2002;323:573–584. doi: 10.1016/S0022-2836(02)00969-5. - DOI - PubMed
    1. Cheng Y, LeGall T, Oldfield CJ, Dunker AK, Uversky VN. Abundance of intrinsic disorder in protein associated with cardiovascular disease. Biochemistry. 2006;45:10448–10460. doi: 10.1021/bi060981d. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources