High GC content causes orphan proteins to be intrinsically disordered - PubMed
- ️Sun Jan 01 2017
High GC content causes orphan proteins to be intrinsically disordered
Walter Basile et al. PLoS Comput Biol. 2017.
Abstract
De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures

The difference between orphans and ancient is statistically significant for all the considered properties: the p-value of a rank-sum test is always <10−141.


For clarity only the ancient (blue) and orphan (red) proteins are shown individually, but the linear fitted lines for genus orphans (pink line) and intermediate ones (light blue) are also shown. In the text box three values are presented: rank-sum p-value = p-value of a rank-sum test of orphans versus ancient (only the property on y axis is considered); correlation p-values = p-value of a linear regression test for orphan and ancient.

For each property, colored lines represent proteins of different age: orphans (red), genus orphans (pink), intermediate (light blue) and ancient (blue). The black lines represent randomly generated proteins at different GC frequencies.

For each property, colored lines represent proteins of different age: orphans (red), genus orphans (pink), intermediate (light blue) and ancient (blue). The black lines represent randomly generated proteins at different GC frequencies.

A black line represents the expected values. The amino acids are sorted by the GC content in their codons.

For each scale the Pearson (R) correlation with GC is also shown.
Similar articles
-
Detection of orphan domains in Drosophila using "hydrophobic cluster analysis".
Bitard-Feildel T, Heberlein M, Bornberg-Bauer E, Callebaut I. Bitard-Feildel T, et al. Biochimie. 2015 Dec;119:244-53. doi: 10.1016/j.biochi.2015.02.019. Epub 2015 Feb 28. Biochimie. 2015. PMID: 25736992
-
Oldfield CJ, Peng Z, Uversky VN, Kurgan L. Oldfield CJ, et al. Cell Mol Life Sci. 2020 Jan;77(1):149-160. doi: 10.1007/s00018-019-03166-6. Epub 2019 Jun 7. Cell Mol Life Sci. 2020. PMID: 31175370 Free PMC article.
-
GC-made protein disorder sheds new light on vertebrate evolution.
Panda A, Podder S, Chakraborty S, Ghosh TC. Panda A, et al. Genomics. 2014 Dec;104(6 Pt B):530-7. doi: 10.1016/j.ygeno.2014.09.003. Epub 2014 Sep 19. Genomics. 2014. PMID: 25240915
-
Coding Regions of Intrinsic Disorder Accommodate Parallel Functions.
Pancsa R, Tompa P. Pancsa R, et al. Trends Biochem Sci. 2016 Nov;41(11):898-906. doi: 10.1016/j.tibs.2016.08.009. Epub 2016 Sep 16. Trends Biochem Sci. 2016. PMID: 27647212 Review.
-
The contribution of intrinsic disorder prediction to the elucidation of protein function.
Cozzetto D, Jones DT. Cozzetto D, et al. Curr Opin Struct Biol. 2013 Jun;23(3):467-72. doi: 10.1016/j.sbi.2013.02.001. Epub 2013 Mar 1. Curr Opin Struct Biol. 2013. PMID: 23466039 Review.
Cited by
-
The Lost and Found: Unraveling the Functions of Orphan Genes.
Fakhar AZ, Liu J, Pajerowska-Mukhtar KM, Mukhtar MS. Fakhar AZ, et al. J Dev Biol. 2023 Jun 13;11(2):27. doi: 10.3390/jdb11020027. J Dev Biol. 2023. PMID: 37367481 Free PMC article. Review.
-
A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila.
Heames B, Schmitz J, Bornberg-Bauer E. Heames B, et al. J Mol Evol. 2020 May;88(4):382-398. doi: 10.1007/s00239-020-09939-z. Epub 2020 Apr 7. J Mol Evol. 2020. PMID: 32253450 Free PMC article.
-
Universal and taxon-specific trends in protein sequences as a function of age.
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. James JE, et al. Elife. 2021 Jan 8;10:e57347. doi: 10.7554/eLife.57347. Elife. 2021. PMID: 33416492 Free PMC article.
-
Intrinsically Disordered Proteins: An Overview.
Trivedi R, Nagarajaram HA. Trivedi R, et al. Int J Mol Sci. 2022 Nov 14;23(22):14050. doi: 10.3390/ijms232214050. Int J Mol Sci. 2022. PMID: 36430530 Free PMC article. Review.
-
Diagnostics of viral infections using high-throughput genome sequencing data.
Ning H, Boyes I, Numanagić I, Rott M, Xing L, Zhang X. Ning H, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae501. doi: 10.1093/bib/bbae501. Brief Bioinform. 2024. PMID: 39417677 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
This work was supported by grants from the Swedish Research Council (http://www.vr.se/, VR-NT 2012-5046, VR-M 2010-3555) and the Swedish E-science Research Center (SeRC, www.e-science.se). Computational resources were provided by the Swedish National Infrastructure for Computing (SNIC, http://www.snic.vr.se/). SL was financed by Bioinformatics Infrastructure for Life Science (BILS, www.bils.se). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous