pubmed.ncbi.nlm.nih.gov

Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs - PubMed

Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs

Zhen Chen et al. PLoS One. 2011.

Abstract

As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365-380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. ROC curves of CKSAAP_UbSite and the binary encoding scheme based on balanced ubiquitination and non-ubiquitination sites.

The performance of CKSAAP_UbSite and the binary encoding scheme was assessed through a 100-fold cross-validation strategy.

Figure 2
Figure 2. The composition of the top-25 residue pairs resulting from two feature selection methods.

The composition of each residue pair is represented by a radial vector whose length is proportional to the composition concerned.

Figure 3
Figure 3. Two Two-Sample-Logos of the position-specific residue composition surrounding the ubiquitination sites and non-ubiquitination sites, which were inferred from Radivojac_dataset (A) and Cai_dataset_1 (B), respectively.

These two logos were prepared using the web server

http://www.twosamplelogo.org/

and only residues significantly enriched and depleted surrounding ubiquitination sites (t-test, P<0.05) are shown.

Figure 4
Figure 4. Comparison of CKSAAP_UbSite and UbPred based on an independent dataset of 21 proteins.

Similar articles

Cited by

References

    1. Haglund K, Dikic I. Ubiquitylation and cell signaling. EMBO J. 2005;24:3353–3359. - PMC - PubMed
    1. Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, et al. Identification, analysis, and prediction of protein ubiquitination sites. Proteins. 2010;78:365–380. - PMC - PubMed
    1. Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics. 2008;9:310. - PMC - PubMed
    1. Hershko A, Ciechanover A. The ubiquitin system. Annu Rev Biochem. 1998;67:425–479. - PubMed
    1. Hicke L. Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol. 2001;2:195–201. - PubMed

Publication types

MeSH terms

Substances