The origins of polypeptide domains - PubMed
- ️Sat Nov 16 2148
Review
The origins of polypeptide domains
Edward E Schmidt et al. Bioessays. 2007 Mar.
Abstract
Three decades ago Gilbert posited that novel proteins arise by re-shuffling genomic sequences encoding polypeptide domains. Today, with numerous genomes and countless genes sequenced, it is well established that recombination of sequences encoding polypeptide domains plays a major role in protein evolution. There is, however, less evidence to suggest how the novel polypeptide domains, themselves, arise. Recent comparisons of genomes from closely related species have revealed numerous species-specific exons, supporting models of domain origin based on "exonization" of intron sequences. Also, a mechanism for the origin of novel polypeptide domains has been proposed based on analyses of insertion-based polymorphisms between orthologous genes across broad phylogenetic spectra and between allelic variants of genes within species. This review discusses these processes and how each might participate in the evolutionary emergence of novel polypeptide domains.
Figures

Genesis of novel polypeptide domains by “exonization” of intron sequences. At top is shown cartoon of a simple gene (exons green, introns uncolored, simplified splice donor and acceptor signals diagramed in blue and red, respectively, promoter sequences in yellow, initiation site as a bent arrow, polyadenylation signals as orange, and splicing pattern by bent lines). Below, acquisition of splice donor and acceptor signals within an existing intron can generate a novel exon (labeled “N”); however, the phase of the exon–exon junctions must be preserved by this exon. Alternative splicing can produce either the original protein or the modified version with the novel polypeptide. This novel exon might subsequently shuffle to other parts of the genome.

Exon- and primary amino acid-structures of vertebrate TBP. At top is depicted the protein-coding exon arrangement of higher vertebrate and cyclostome TBP mRNAs.(27) Heavy lines depict the relevant region of each mRNA, triangles indicate exon–exon junctions, numbers are exon designations. For cyclostomes, the exon structure has only been determined for the N-terminal region; the C-terminal region is labeled “???”. A cyclostome-specific intron divides the region homologous to higher vertebrate exon 3 into separate left and right exons (designated 3L and 3R).(24) Below is the primary amino acid structure of vertebrate TBP with subregions and repeat units emphasized. Below this is the linear distribution of phylogenetic histories of each region along the sequence. Note that boundaries in phylogenetic history do not correlate to exon boundaries. The TBPCORE domain is encoded by an inverted repeat (arrows) in all phyla, resulting in the conserved symmetrical “saddle” structure of the protein;(21,54) other recognized repeat units are tandem.(24) At the bottom is a key of the various oligopeptide units recognizable in hagfish TBP, indicating the number of repeats of each unit seen in human and hagfish TBPs.(24) Due to mutation and genetic drift, some repeat units are less obvious in higher vertebrates.(24) The two regions having no recognizable pattern (*) are dissimilar to each other and are not obviously derived by duplication and deletion of adjacent sequences; however, like the rest of the N terminus, these regions might have arisen by insertion/deletion of sequences whose ancestry is not recognized. Q repeat lengths are given as number of Q-codons (**) because, even though this region can expand and contract as larger multi-codon units, this has only been documented once.(48) Normal human tbp genes have 36–38 Qs in the repeat region; however patients with SCA17 neuropathologies have severely expanded TBP Q-domains.(48,49)

Alignment of predicted amino acid sequences of the alpha-2 domain of two bison MHC-I alleles. Allele Bibi-N*00501 is a novel allele with a seven amino acid duplication following position 163 of the parent allele, Bibi-N*00502. Numbering corresponds to amino acid positions of normal bovine or bison MHC-I proteins starting from the first amino acid of the mature protein. Dots in the Bibi-N*00501 sequence indicate amino acid identity with Bibi-N*00502. The donor sequence is highlighted in green and the duplicated peptide in allele Bibi-N*00501 is highlighted in blue. The symbols above the Bibi-N*00502 sequence indicate amino acids of typical MHC-I proteins (e.g. Bibi-N*00502) that are predicted to contact either the bound oligopeptide or the T-cell receptor: * predicted to contact the peptide, + predicted to contact the T-cell ∼ receptor, predicted to contact both.(55) The seven amino acid duplication in Bibi-N*00501 is in the middle of the α-helix that contacts with both the peptide antigen and the T-cell receptor.
Similar articles
-
The relationship between domain duplication and recombination.
Vogel C, Teichmann SA, Pereira-Leal J. Vogel C, et al. J Mol Biol. 2005 Feb 11;346(1):355-65. doi: 10.1016/j.jmb.2004.11.050. Epub 2004 Dec 23. J Mol Biol. 2005. PMID: 15663950
-
Ponting CP, Mott R, Bork P, Copley RR. Ponting CP, et al. Genome Res. 2001 Dec;11(12):1996-2008. doi: 10.1101/gr.198701. Genome Res. 2001. PMID: 11731489
-
Origins and structural properties of novel and de novo protein domains during insect evolution.
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Klasberg S, et al. FEBS J. 2018 Jul;285(14):2605-2625. doi: 10.1111/febs.14504. Epub 2018 Jun 29. FEBS J. 2018. PMID: 29802682
-
Arrangements in the modular evolution of proteins.
Moore AD, Björklund AK, Ekman D, Bornberg-Bauer E, Elofsson A. Moore AD, et al. Trends Biochem Sci. 2008 Sep;33(9):444-51. doi: 10.1016/j.tibs.2008.05.008. Epub 2008 Jul 24. Trends Biochem Sci. 2008. PMID: 18656364 Review.
-
The multiplicity of domains in proteins.
Doolittle RF. Doolittle RF. Annu Rev Biochem. 1995;64:287-314. doi: 10.1146/annurev.bi.64.070195.001443. Annu Rev Biochem. 1995. PMID: 7574483 Review.
Cited by
-
Testis-specific glyceraldehyde-3-phosphate dehydrogenase: origin and evolution.
Kuravsky ML, Aleshin VV, Frishman D, Muronetz VI. Kuravsky ML, et al. BMC Evol Biol. 2011 Jun 10;11:160. doi: 10.1186/1471-2148-11-160. BMC Evol Biol. 2011. PMID: 21663662 Free PMC article.
-
Intron creation and DNA repair.
Ragg H. Ragg H. Cell Mol Life Sci. 2011 Jan;68(2):235-42. doi: 10.1007/s00018-010-0532-2. Epub 2010 Sep 19. Cell Mol Life Sci. 2011. PMID: 20853128 Free PMC article. Review.
-
Lampard GR, Lukowitz W, Ellis BE, Bergmann DC. Lampard GR, et al. Plant Cell. 2009 Nov;21(11):3506-17. doi: 10.1105/tpc.109.070110. Epub 2009 Nov 6. Plant Cell. 2009. PMID: 19897669 Free PMC article.
-
Small proteins: untapped area of potential biological importance.
Su M, Ling Y, Yu J, Wu J, Xiao J. Su M, et al. Front Genet. 2013 Dec 16;4:286. doi: 10.3389/fgene.2013.00286. Front Genet. 2013. PMID: 24379829 Free PMC article. Review.
-
Evolution of JAK-STAT pathway components: mechanisms and role in immune system development.
Liongue C, O'Sullivan LA, Trengove MC, Ward AC. Liongue C, et al. PLoS One. 2012;7(3):e32777. doi: 10.1371/journal.pone.0032777. Epub 2012 Mar 7. PLoS One. 2012. PMID: 22412924 Free PMC article.
References
-
- Gilbert W. Why genes in pieces? Nature. 1978;271:501. - PubMed
-
- Greer JM, Puetz J, Thomas KR, Capecchi MR. Maintenance of functional equivalence during paralogous Hox gene evolution. Nature. 2000;403:661–665. - PubMed
-
- Li WH, Gu Z, Cavalcanti AR, Nekrutenko A. Detection of gene duplications and block duplications in eukaryotic genomes. J Struct Funct Genomics. 2003;3:27–34. - PubMed
-
- Hoegg S, Meyer A. Hox clusters as models for vertebrate genome evolution. Trends Genet. 2005;21:421–424. - PubMed
-
- de Souza SJ, Long M, Gilbert W. Introns and gene evolution. Genes Cells. 1996;1:493–505. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources