In-silico prediction of disorder content using hybrid sequence representation - PubMed
- ️Sat Jan 01 2011
In-silico prediction of disorder content using hybrid sequence representation
Marcin J Mizianty et al. BMC Bioinformatics. 2011.
Abstract
Background: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content.
Results: We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content.
Conclusions: DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.
Figures
![Figure 1](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/f9cd90962f29/1471-2105-12-245-1.gif)
Architecture of the DisCon predictor.
![Figure 2](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/015594a8cfde/1471-2105-12-245-2.gif)
The MCC values (y-axis) for the binary prediction of chains that are characterized by the amount of the disorder below/above a cut-off value shown on the x-axis. The binary predictions are computed by thresholding the predicted disorder content generated by DisCon and the 10 considered disorder predictors on the test dataset.
![Figure 3](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/f272d7adf1b6/1471-2105-12-245-3.gif)
The MCC values for the residue-level disorder prediction adjusted using content predicted by DisCon. The bar chart includes the original predictions (densely dotted red bars), predictions with a fixed cut-off that is optimized to maximize MCC on the entire test dataset (sparsely dotted blue bars), predictions where the content predicted by DisCon is used to adjust the cut-off (solid black bars) and where the content predicted by MD if its values are > 0.65 or < 0.1 and otherwise content predicted by DisCon are used to adjust the cut-off (solid green bars). The results were computed on the test dataset and the methods on the x-axis are sorted by their original MCC values.
![Figure 4](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/9411bd694c86/1471-2105-12-245-4.gif)
Prediction of disordered residues in the apoptosis-inducing ligand 2 (Apo2L) protein (PDB ID 1DG6 chain A) by Ucon (thin blue line), PONDR-FIT (thin red line), MD (thin green line), MFDp (thin gray line), IUPredS (thin pink line), and DISOPRED2 (thin cyan line) predictors. The original cut-offs are shown using dashed lines. The native disordered regions are annotated using black horizontal line. The original binary predictions from Ucon, PONDR-FIT, MD, MFDp, IUPredS, and DISOPRED2 are denoted using blue (at the -0.1 point on the y-axis), red (at the -0.2), green (at the -0.3), gray (at the -0.4), pink (at the -0.5), and cyan (at the -0.6) horizontal lines. The binary predictions that were adjusted to match content predicted with DisCon are shown using horizontal bright green lines located immediately under the lines that show the original predictions.
![Figure 5](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/d6a1d23c397d/1471-2105-12-245-5.gif)
Prediction of disordered residues in the inosine-5'-monophosphate dehydrogenase protein (DisProt ID DP00399) by Ucon (thin blue line), PONDR-FIT (thin red line), MD (thin green line), MFDp (thin gray line), IUPredS (thin pink line), and DISOPRED2 (thin cyan line) predictors. The original cut-offs are shown using dashed lines. The native disordered regions are annotated using black horizontal line. The original binary predictions from Ucon, PONDR-FIT, MD, MFDp, IUPredS, and DISOPRED2 are denoted using blue (at the -0.1 point on the y-axis), red (at the -0.2), green (at the -0.3), gray (at the -0.4), pink (at the -0.5), and cyan (at the -0.6) horizontal lines. The binary predictions that were adjusted to match content predicted with DisCon are shown using horizontal bright green lines located immediately under the lines that show the original predictions.
![Figure 6](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3d/3212983/de931138db4b/1471-2105-12-245-6.gif)
Scatter plots of the relations between the values of selected four input features, SS_HE-DOM_in-BFNS_low-RSA_B (green circle markers), BFNS_low-Seg_10 (red triangle markers), SS_CH-BFNS_high-DOM_notin (black × markers), and CHC...CHSeg (blue hollow circle markers) shown on y-axis and the native disorder content given on x-axis. The lines correspond to linear regressions with the corresponding R2 values.
Similar articles
-
Mizianty MJ, Peng Z, Kurgan L. Mizianty MJ, et al. Intrinsically Disord Proteins. 2013 Apr 1;1(1):e24428. doi: 10.4161/idp.24428. eCollection 2013 Jan-Dec. Intrinsically Disord Proteins. 2013. PMID: 28516009 Free PMC article.
-
Mizianty MJ, Stach W, Chen K, Kedarisetti KD, Disfani FM, Kurgan L. Mizianty MJ, et al. Bioinformatics. 2010 Sep 15;26(18):i489-96. doi: 10.1093/bioinformatics/btq373. Bioinformatics. 2010. PMID: 20823312 Free PMC article.
-
Mizianty MJ, Kurgan L. Mizianty MJ, et al. BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414. BMC Bioinformatics. 2009. PMID: 20003388 Free PMC article.
-
Structural protein descriptors in 1-dimension and their sequence-based predictions.
Kurgan L, Disfani FM. Kurgan L, et al. Curr Protein Pept Sci. 2011 Sep;12(6):470-89. doi: 10.2174/138920311796957711. Curr Protein Pept Sci. 2011. PMID: 21787299 Review.
-
Accuracy of protein-level disorder predictions.
Katuwawala A, Oldfield CJ, Kurgan L. Katuwawala A, et al. Brief Bioinform. 2020 Sep 25;21(5):1509-1522. doi: 10.1093/bib/bbz100. Brief Bioinform. 2020. PMID: 31616935 Review.
Cited by
-
DISOselect: Disorder predictor selection at the protein level.
Katuwawala A, Oldfield CJ, Kurgan L. Katuwawala A, et al. Protein Sci. 2020 Jan;29(1):184-200. doi: 10.1002/pro.3756. Epub 2019 Nov 7. Protein Sci. 2020. PMID: 31642118 Free PMC article.
-
An Overview of Predictors for Intrinsically Disordered Proteins over 2010-2014.
Li J, Feng Y, Wang X, Li J, Liu W, Rong L, Bao J. Li J, et al. Int J Mol Sci. 2015 Sep 29;16(10):23446-62. doi: 10.3390/ijms161023446. Int J Mol Sci. 2015. PMID: 26426014 Free PMC article. Review.
-
Rogers LC, Zhou J, Baker A, Schutt CR, Panda PK, Van Tine BA. Rogers LC, et al. Cancer Metab. 2021 Jan 21;9(1):4. doi: 10.1186/s40170-021-00238-9. Cancer Metab. 2021. PMID: 33478587 Free PMC article.
-
Wang Z, Yang Q, Li T, Cong P. Wang Z, et al. PLoS One. 2015 Jun 19;10(6):e0128334. doi: 10.1371/journal.pone.0128334. eCollection 2015. PLoS One. 2015. PMID: 26090958 Free PMC article.
-
Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S. Lieutaud P, et al. Intrinsically Disord Proteins. 2016 Dec 21;4(1):e1259708. doi: 10.1080/21690707.2016.1259708. eCollection 2016. Intrinsically Disord Proteins. 2016. PMID: 28232901 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources