pubmed.ncbi.nlm.nih.gov

Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes - PubMed

Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes

Sara Willis et al. Genetics. 2018 Sep.

Abstract

The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.

Keywords: alternative reading frame; evolutionary constraint; gene age; mutation-driven evolution; overprinting.

Copyright © 2018 by the Genetics Society of America.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Three nonmutually-exclusive hypotheses about why overlapping genes have high ISD. The column on the right describes the ISD patterns we would expect if the hypotheses were true.

Figure 2
Figure 2

Statistical classification of relative ages. (A) The receiver operating characteristic plot for determining which member of an overlapping gene pair has higher CAI, and is hence presumed to be ancestral. Only genes with an overlapping region of at least 200 nucleotides are plotted. (B) CAI classification of the 91 gene pairs for which codon usage data were available was based both on P-value and on the length of the overlapping regions. The vertical line shows the overlapping length cutoff of 200 nucleotides, the horizontal line shows the P-value cutoff; CAI classification was considered informative for the 27 bottom right points.

Figure 3
Figure 3

How the relative ages of the genes were classified for 47 out of the 92 overlapping gene pairs for which sequence data were available. Within each shaded region, each gene pair is counted within only one of the subregions shown. Each shaded region’s total is found by summing the individual subtotals within it, some of which are noted outside the shaded regions. For example, the relative ages of 39 genes were classified via phylogenetic analysis: 20 through phylogenetic analysis alone (blue circle), 18 supported via codon analysis (intersection of blue and white circle), and one contradicted by codon analysis (yellow circle).

Figure 4
Figure 4

ISD results support the birth-facilitation and (preadaptation version of the) conflict resolution hypotheses. (A) Data are from the overlapping sections of the 47 gene pairs whose ages could be classified, from nonoverlapping genes in the species in which those 47 overlapping gene pairs were found, and from the 27 available preoverlapping ancestral homologs. Some numbers in the main text are for the full set of 92 overlapping gene pairs and might not match exactly. (B) While frame significantly impacts disorder content (green is consistently higher than blue), it does not drive the high ISD of overlapping genes. Even controlling for frame, novel ISD > ancestral (left), supporting birth facilitation. Supporting conflict resolution, ancestral > nonoverlapping either inframe (yellow) or matched frameshifted control. The preadaptation version of the constraint hypothesis is supported by the fact that nonoverlapping ISD < preoverlapping (compare the two yellow bars). Means and 66% confidence intervals were calculated from the Box-Cox transformed (with λ = 0.4) means and their SE, and are shown here following back-transformation.

Figure 5
Figure 5

(A) The ISD of a nonoverlapping gene predicts the ISD of the non-ORF frameshifted version of that sequence. Model I regression lines are statistically indistiguishable in our set of 150 nonoverlapping viral genes vs. our set of 27 preoverlapping genes, which are therefore pooled (R2=0.30 and R2=0.24 for the +1 and +2 groups, respectively). (B) The overlapping ISD of the 47 gene pairs with classifiable relative ages. Each datapoint represents one overlapping pair. The regression lines from (A) are superimposed to illustrate the elevation of novel gene ISD born into the already intrinsically high-ISD +2 frame, which destroys the correlation (R2=6.6×10−5 for +2 in contrast to 0.26 for +1).

Similar articles

Cited by

References

    1. Ángyán A. F., Perczel A., Gáspári Z., 2012. Estimating intrinsic structural preferences of de novo emerging random-sequence proteins: is aggregation the main bottleneck? FEBS Lett. 586: 2468–2472. 10.1016/j.febslet.2012.06.007 - DOI - PubMed
    1. Belshaw R., Pybus O. G., Rambaut A., 2007. The evolution of genome compression and genomic novelty in RNA viruses. Genome Res. 17: 1496–1504. 10.1101/gr.6305707 - DOI - PMC - PubMed
    1. Bornberg-Bauer E., Alba M. M., 2013. Dynamics and adaptive benefits of modular protein evolution. Curr. Opin. Struct. Biol. 23: 459–466. 10.1016/j.sbi.2013.02.012 - DOI - PubMed
    1. Brown C. J., Takayama S., Campen A. M., Vise P., Marshall T. W., et al. , 2002. Evolutionary rate heterogeneity in proteins with long disordered regions. J. Mol. Evol. 55: 104–110. 10.1007/s00239-001-2309-6 - DOI - PubMed
    1. Buljan M., Frankish A., Bateman A., 2010. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol. 11: R74 10.1186/gb-2010-11-7-r74 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources