Assessment of the assessment: evaluation of the model quality estimates in CASP10 - PubMed
- ️Invalid Date
. 2014 Feb;82 Suppl 2(0 2):112-26.
doi: 10.1002/prot.24347. Epub 2013 Aug 31.
Affiliations
- PMID: 23780644
- PMCID: PMC4406045
- DOI: 10.1002/prot.24347
Assessment of the assessment: evaluation of the model quality estimates in CASP10
Andriy Kryshtafovych et al. Proteins. 2014 Feb.
Abstract
The article presents an assessment of the ability of the thirty-seven model quality assessment (MQA) methods participating in CASP10 to provide an a priori estimation of the quality of structural models, and of the 67 tertiary structure prediction groups to provide confidence estimates for their predicted coordinates. The assessment of MQA predictors is based on the methods used in previous CASPs, such as correlation between the predicted and observed quality of the models (both at the global and local levels), accuracy of methods in distinguishing between good and bad models as well as good and bad regions within them, and ability to identify the best models in the decoy sets. Several numerical evaluations were used in our analysis for the first time, such as comparison of global and local quality predictors with reference (baseline) predictors and a ROC analysis of the predictors' ability to differentiate between the well and poorly modeled regions. For the evaluation of the reliability of self-assessment of the coordinate errors, we used the correlation between the predicted and observed deviations of the coordinates and a ROC analysis of correctly identified errors in the models. A modified two-stage procedure for testing MQA methods in CASP10 whereby a small number of models spanning the whole range of model accuracy was released first followed by the release of a larger number of models of more uniform quality, allowed a more thorough analysis of abilities and inabilities of different types of methods. Clustering methods were shown to have an advantage over the single- and quasi-single- model methods on the larger datasets. At the same time, the evaluation revealed that the size of the dataset has smaller influence on the global quality assessment scores (for both clustering and nonclustering methods), than its diversity. Narrowing the quality range of the assessed models caused significant decrease in accuracy of ranking for global quality predictors but essentially did not change the results for local predictors. Self-assessment error estimates submitted by the majority of groups were poor overall, with two research groups showing significantly better results than the remaining ones.
Keywords: CASP; QA; model quality assessment; protein structure modeling; protein structure prediction.
Copyright © 2013 Wiley Periodicals, Inc.
Figures

Correlation coefficients for the 37 participating groups and the reference Davis-QAconsensus benchmarking method in per-target based assessment (A, B) and all targets pooled together assessment (C, D). Panels (A) and (C) illustrate the results for the B150 datasets; panels (B) and (D) those for the S20 datasets. Bars showing z-scores are drawn in darker colors; correlation coefficients in lighter colors. The z-scores for the reference method are calculated from the average and standard deviation values of the correlation coefficients for the 37 participating predictors. Clustering methods are shown in black, single model methods in blue, quasi-single model methods in green, and the reference method in red.

Ability of the QA predictors to identify the best models in the decoy sets. The analysis was carried out on the 74 /75 targets, for which at least one structural model had a GDT_TS score over 40 in the S20 /B150 datasets, respectively. Coloring of the methods is the same as in Figure 1. (A) Average difference in quality between the models predicted to be the best and the actual best models. For each group, ΔGDT_TS scores are calculated for every target, and averaged over all predicted targets in every dataset. The lower the score, the better the group's performance. (B, C) Stacked bars showing the percentage of predictions where the model estimated to be the best in B150 (B) and S20 (C) datasets, is 0–2, 2–10 and >10 GDT_TS units away from the actual best model. Groups are sorted according to the results in the 0–2 bin (green bars).

ROC curves for the binary classifications of models into two classes: good (GDT_TS ≥ 50) and bad (otherwise). Group names are provided as a top-to-bottom list ordered according to decreasing AUC scores. Group name's background differentiates different types of methods according to the coloring scheme in Figure 1 (clustering methods have transparent background). Statistically indistinguishable groups are boxed in the legend and provided with data markers. The data for the reference consensus method are plotted as a thicker black line. The data are shown for the best 25 groups only. For clarity, only the left upper quarter of a typical ROC curve plot is shown (FPR ≤ 0.5, TPR ≥ 0.5). The inset shows the AUC scores for all groups, and for two different definitions of model correctness: GDT_TS ≥ 40 and GDT_TS ≥ 50.

Correlation analysis for 19 individual groups and the Davis-QAconsensus reference method for the per-residue quality prediction category. Correlation analysis results are calculated on a per-model basis and subsequently averaged over all models and all targets in the B150 (A) and S20 (B) datasets.

Accuracy of the binary classifications of residues (reliable/unreliable) based on the results of the ROC analysis. A residue in a model is defined to be correct when its Cα is within 3.8Å from the corresponding residue in the model. Group names are provided as a top-to-bottom list ordered according to decreasing AUC scores. Group name's background differentiates different types of methods according to the coloring scheme in Figure 1 (clustering methods have transparent background). The data for the reference consensus method are plotted as a thicker black line. For clarity, only the left half of a typical ROC curve plot is shown (FPR ≤ 0.5). The inset shows the AUC scores for all groups.

Self-assessment of per-residue model confidence estimates (QA3) on “all-group” targets. (A) Log-linear Pearson correlation coefficients for the top 20 groups. (B) ROC curves for the participating CASP10 methods and two baseline single-model QA methods – ProQ2 and QMEAN. The data for the two baseline methods are plotted as thicker black lines; the QMEAN curve is dashed. The inset shows the AUC scores and 95% confidence intervals for the top eight methods, including the baseline ones. (C, D) Predicted residue errors vs. observed distances for two CASP10 groups: IntFOLD2 (C) and ProQ2 (D).

Comparison of the results of the best 20 groups (QA1) and the best 12 groups (QA2, QA3) in the last two CASPs according to various evaluation metrics. Groups are sorted from best to worst in each CASP. (A-D) Correlation coefficients in (A) QA1.1, (B) QA1.2, (C) QA2.1 and (D) QA3 evaluation modes. Filled data markers and bolder trendlines in the QA3 analysis (panels D and I) correspond to the assessment of all methods, while hollow markers and thinner trendlines to server only methods. (E-F) Ability of predictors to identify the best model: (E) average loss in quality due to nonoptimal selection, (F) ratios of recognized and missed best models (filled data markers and bolder trendlines: percentage of models recognized within 2 GDT_TS units; hollow markers and thinner trendlines: models missed by more than 10 GDT_TS units). (G-I) Success of predictors in distinguishing between (G) good and bad models, and (H, I) correctly and incorrectly modeled regions in the QA2.2 (H) and QA3 (I) evaluation modes. The thresholds for defining a “correct” model/residue for the MCC and AUC calculations were set to 50 GDT_TS for the global classifier assessment (panel G) and to 3.8 Å for the local classifier assessment (panels H and I).
Similar articles
-
Kryshtafovych A, Barbato A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Kryshtafovych A, et al. Proteins. 2016 Sep;84 Suppl 1(Suppl 1):349-69. doi: 10.1002/prot.24919. Epub 2015 Sep 28. Proteins. 2016. PMID: 26344049 Free PMC article.
-
Assessment of model accuracy estimations in CASP12.
Kryshtafovych A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Kryshtafovych A, et al. Proteins. 2018 Mar;86 Suppl 1(Suppl 1):345-360. doi: 10.1002/prot.25371. Epub 2017 Sep 8. Proteins. 2018. PMID: 28833563 Free PMC article.
-
Assessment of protein disorder region predictions in CASP10.
Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K. Monastyrskyy B, et al. Proteins. 2014 Feb;82 Suppl 2(0 2):127-37. doi: 10.1002/prot.24391. Epub 2013 Nov 22. Proteins. 2014. PMID: 23946100 Free PMC article.
-
Liu T, Wang Z. Liu T, et al. BMC Bioinformatics. 2020 Jul 6;21(Suppl 4):246. doi: 10.1186/s12859-020-3383-3. BMC Bioinformatics. 2020. PMID: 32631256 Free PMC article. Review.
-
State-of-the-art web services for de novo protein structure prediction.
Abriata LA, Dal Peraro M. Abriata LA, et al. Brief Bioinform. 2021 May 20;22(3):bbaa139. doi: 10.1093/bib/bbaa139. Brief Bioinform. 2021. PMID: 34020540 Review.
Cited by
-
Cao R, Wang Z, Cheng J. Cao R, et al. BMC Struct Biol. 2014 Apr 15;14:13. doi: 10.1186/1472-6807-14-13. BMC Struct Biol. 2014. PMID: 24731387 Free PMC article.
-
Energy-based graph convolutional networks for scoring protein docking models.
Cao Y, Shen Y. Cao Y, et al. Proteins. 2020 Aug;88(8):1091-1099. doi: 10.1002/prot.25888. Epub 2020 Mar 16. Proteins. 2020. PMID: 32144844 Free PMC article.
-
Uncertainty in integrative structural modeling.
Schneidman-Duhovny D, Pellarin R, Sali A. Schneidman-Duhovny D, et al. Curr Opin Struct Biol. 2014 Oct;28:96-104. doi: 10.1016/j.sbi.2014.08.001. Epub 2014 Aug 28. Curr Opin Struct Biol. 2014. PMID: 25173450 Free PMC article. Review.
-
Umunnakwe CN, Loyd H, Cornick K, Chavez JR, Dobbs D, Carpenter S. Umunnakwe CN, et al. Retrovirology. 2014 Dec 23;11:115. doi: 10.1186/s12977-014-0115-7. Retrovirology. 2014. PMID: 25533001 Free PMC article.
-
Liu T, Wang Y, Eickholt J, Wang Z. Liu T, et al. Sci Rep. 2016 Jan 14;6:19301. doi: 10.1038/srep19301. Sci Rep. 2016. PMID: 26763289 Free PMC article.
References
-
- Schwede T, Sali A, Honig B, Levitt M, Berman HM, Jones D, Brenner SE, Burley SK, Das R, Dokholyan NV, Dunbrack RL, Jr, Fidelis K, Fiser A, Godzik A, Huang YJ, Humblet C, Jacobson MP, Joachimiak A, Krystek SR, Jr, Kortemme T, Kryshtafovych A, Montelione GT, Moult J, Murray D, Sanchez R, Sosnick TR, Standley DM, Stouch T, Vajda S, Vasquez M, Westbrook JD, Wilson IA. Outcome of a workshop on applications of protein models in biomedical research. Structure. 2009;17:151–159. - PMC - PubMed
-
- Moult J. Comparative modeling in structural genomics. Structure. 2008;16:14–16. - PubMed
-
- Tramontano A. The role of molecular modelling in biomedical research. FEBS Lett. 2006;580:2928–2934. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous