pubmed.ncbi.nlm.nih.gov

Structural disorder of plasmid-encoded proteins in Bacteria and Archaea - PubMed

  • ️Mon Jan 01 2018

Structural disorder of plasmid-encoded proteins in Bacteria and Archaea

Nenad S Mitić et al. BMC Bioinformatics. 2018.

Abstract

Background: In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mechanisms: the entropic chain mechanism which is responsible for rapid fluctuations among many alternative conformations, and molecular recognition via short recognition elements that bind to other molecules. IDPs possess a high adaptive potential and there is special interest in investigating their involvement in organism evolution.

Results: We analyzed 2554 Bacterial and 139 Archaeal proteomes, with a total of 8,455,194 proteins for disorder content and its implications for adaptation of organisms, using three disorder predictors and three measures. Along with other findings, we revealed that for all three predictors and all three measures (1) Bacteria exhibit significantly more disorder than Archaea; (2) plasmid-encoded proteins contain considerably more IDRs than proteins encoded on chromosomes (or whole genomes) in both prokaryote superkingdoms; (3) plasmid proteins are significantly more disordered than chromosomal proteins only in the group of proteins with no COG category assigned; (4) antitoxin proteins in comparison to other proteins, are the most disordered (almost double) in both Bacterial and Archaeal proteomes; (5) plasmidal proteins are more disordered than chromosomal proteins in Bacterial antitoxins and toxin-unclassified proteins, but have almost the same disorder content in toxin proteins.

Conclusion: Our results suggest that while disorder content depends on genome and proteome characteristics, it is more influenced by functional engagements than by gene location (on chromosome or plasmid).

Keywords: Bacteria and Archaea; Intrinsically disordered proteins; Plasmid-encoded proteins; Toxin/antitoxin.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1

Disorder content in Archaea and Bacteria. Disorder content is predicted using three predictors (IUPred-L, IsUnstruct and VSL2b) and three measures

Fig. 2
Fig. 2

Disorder content in long (>30AA) disordered regions in Bacteria and Archaea with small proteomes. The disorder content represents the percentage of amino acids in long disordered regions, predicted by the IsUnstruct predictor. Since Archaea proteome size is in range of 1000 to 4000 proteins, only Bacteria in the same range are selected, in order to emphasize the difference in predicted disorder content between Bacteria and Archaea with similar proteome sizes. The box diagrams in the paper follow the usual representation: 1) the horizontal line inside a box represents the median value (50% of the samples is lower and 50% of the samples are higher than median); 2) lower box bound represents first quartile value (25% of data are lower and 75% are higher than first quartile); 3) upper box bound represents third quartile value (75% of data are lower and 25% are higher than third quartile); 4) the box height represents interquartile range (IQR); in the case of normal distribution, IQR = 1.35 x σ; 5) the whiskers (vertical lines above and under the box) ranges up to the highest datum within 1.5 x IQR of the upper quartile and down to the lowest datum within 1.5× IQR of the lower quartile; 6) the dots above the top whisker and under the bottom whisker represent outliers, i.e. the samples that are out of the range (in some of the diagrams each sample is represented as a dot, and outliers are not specifically highlighted, because it is obvious which samples lay out of the whiskers range); 7) in some of the diagrams the red dot represents the mean value

Fig. 3
Fig. 3

Disorder content in long (>30AA) disordered regions in Bacteria and Archaea per gene location. The disorder content represents the percentage of amino acids in long disordered regions, predicted by the IsUnstruct predictor. The proteomes are divided in protein sets encoded by chromosome/plasmid DNA. The overall organisms disorder content is almost the same as in the chorosome-encoded proteome subset

Fig. 4
Fig. 4

Disorder content in long (>30AA) disordered regions in Bacteria by gene location, as a function of G + C content. Disorder is predicted by the IsUnstruct predictor

Fig. 5
Fig. 5

Disorder content in long (>30AA) disordered regions for different clusters of orthologous groups of proteins (COG groups) in Archaea and Bacteria. Disorder is predicted by the IsUnstruct predictor. COG groups are: Cp – Cellular Processes, Isp – Information Storage and Processing, Me –Metabolic, N.C. – Not in COG, Pc – Poorly characterized. The box diagrams in the paper follow the usual representation (see Fig. 2 caption for details)

Fig. 6
Fig. 6

Impact of the attributes on disorder content, Variable COG denotes a COG group of a gene/protein (similarly for GC, Superkingdom. Toxin type, Chromosome/plasmid). Bar sizes denote level of impact of each characteristics on protein disorder. “Importance” on the diagram actually means impact. The highest impact on the percentage of protein disorder has COG group (N.C., Cp, Isp, Pc, Me) the protein belongs to (52.25%), then the percentage of GC nucleotides (38.60%), while impact of other characteristics is considerably lower (Superkingdom - 5.78%, Chromosome/plasmid - 2.96% i Toxin type - 0.41%)

Fig. 7
Fig. 7

Disorder content of Bacterial COG groups in plasmids and chromosomes expressed as the percentage of disordered AAs

Fig. 8
Fig. 8

Disorder content of different COG categories and data subsets for Bacteria. Plasmid-encoded proteins in Not in COG (N.C.) and Poorly characterized (Pc) groups have higher disorder content than chromosome-encoded ones, while in most of the categories in Cellular processes (Cp), Information storage and processing (Isp) and Metabolism (Me) COG groups, plasmid-encoded proteins have similar or lower disorder content than chromosome-encoded ones (Cell motility (N), Cell cycle control, cell division, chromosome partitioning (D) and Intracellular trafficking, secretion, and vesicular transport (U) COG categories in Cp group, Translation, ribosomal structure and biogenesis (J) COG category in Isp group, Energy production and conversion COG (C), Amino acid transport and metabolism (E), Carbohydrate transport and metabolism (G), Lipid transport and metabolism (I), Inorganic ion transport and metabolism (P) and Secondary metabolites biosynthesis, transport, and catabolism (Q) in Me group. For all measures and Archaea see Additional file 1: Figure S9

Fig. 9
Fig. 9

Percentage of proteins in COG categories for Bacteria For exact data and the distribution of proteins in COG categories for Archaea, see Additional file 1: Figure S3

Fig. 10
Fig. 10

Disorder content in hypothetical proteins in comparison to non-hypothetical proteins for Bacteria. For Archaea and exact data, Additional file 1: Figure S10

Fig. 11
Fig. 11

Disorder level in Toxin and Antitoxin proteins (complete genomes)

Fig. 12
Fig. 12

Disorder contents of chromosome- and plasmid-encoded toxin, antitoxin and toxin-unclassified proteins. The disorder content represents the percentage of amino acids in disordered regions, predicted by the IsUnstruct predictor

Fig. 13
Fig. 13

Disorder contents of chromosome- and plasmid-encoded toxin, antitoxin and toxin-unclassified proteins according to COG groups for Bacteria. For Arcahea and exact data, see Additional file 1: Figure S16

Similar articles

Cited by

References

    1. Smalla K, Top EM, Jechalke S. Plasmid detection, characterization, and ecology. In: Tolmasky ME, Alonso JC, editors. Plasmids: biology and impact in biotechnology and discovery: Morgan Kaufmann, Elsevier Inc, ASM Press; 2015. 10.1128/microbiolspec.PLAS-0038-2014. - PMC - PubMed
    1. Leplae R, Lima-Mendez G, Toussaint A. A first global analysis of plasmid encoded proteins in the ACLAME database. FEMS Microbiol Rev. 2006;30:980–994. doi: 10.1111/j.1574-6976.2006.00044.x. - DOI - PubMed
    1. Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015;43(D1):D261–D269. doi: 10.1093/nar/gku1223. - DOI - PMC - PubMed
    1. Xia Y, Franzosa EA, Gerstein MB. Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol. 2009;5(6):e1000413. doi: 10.1371/journal.pcbi.1000413. - DOI - PMC - PubMed
    1. Yin Y, Fischer D. On the origin of microbial ORFans: quantifying the strength of the evidence for viral lateral transfer. BMC Evol Biol. 2006;6:63. doi: 10.1186/1471-2148-6-63. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances