pubmed.ncbi.nlm.nih.gov

The origins of polypeptide domains - PubMed

  • ️Sat Nov 16 2148

Review

The origins of polypeptide domains

Edward E Schmidt et al. Bioessays. 2007 Mar.

Abstract

Three decades ago Gilbert posited that novel proteins arise by re-shuffling genomic sequences encoding polypeptide domains. Today, with numerous genomes and countless genes sequenced, it is well established that recombination of sequences encoding polypeptide domains plays a major role in protein evolution. There is, however, less evidence to suggest how the novel polypeptide domains, themselves, arise. Recent comparisons of genomes from closely related species have revealed numerous species-specific exons, supporting models of domain origin based on "exonization" of intron sequences. Also, a mechanism for the origin of novel polypeptide domains has been proposed based on analyses of insertion-based polymorphisms between orthologous genes across broad phylogenetic spectra and between allelic variants of genes within species. This review discusses these processes and how each might participate in the evolutionary emergence of novel polypeptide domains.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Genesis of novel polypeptide domains by “exonization” of intron sequences. At top is shown cartoon of a simple gene (exons green, introns uncolored, simplified splice donor and acceptor signals diagramed in blue and red, respectively, promoter sequences in yellow, initiation site as a bent arrow, polyadenylation signals as orange, and splicing pattern by bent lines). Below, acquisition of splice donor and acceptor signals within an existing intron can generate a novel exon (labeled “N”); however, the phase of the exon–exon junctions must be preserved by this exon. Alternative splicing can produce either the original protein or the modified version with the novel polypeptide. This novel exon might subsequently shuffle to other parts of the genome.

Figure 2
Figure 2

Exon- and primary amino acid-structures of vertebrate TBP. At top is depicted the protein-coding exon arrangement of higher vertebrate and cyclostome TBP mRNAs.(27) Heavy lines depict the relevant region of each mRNA, triangles indicate exon–exon junctions, numbers are exon designations. For cyclostomes, the exon structure has only been determined for the N-terminal region; the C-terminal region is labeled “???”. A cyclostome-specific intron divides the region homologous to higher vertebrate exon 3 into separate left and right exons (designated 3L and 3R).(24) Below is the primary amino acid structure of vertebrate TBP with subregions and repeat units emphasized. Below this is the linear distribution of phylogenetic histories of each region along the sequence. Note that boundaries in phylogenetic history do not correlate to exon boundaries. The TBPCORE domain is encoded by an inverted repeat (arrows) in all phyla, resulting in the conserved symmetrical “saddle” structure of the protein;(21,54) other recognized repeat units are tandem.(24) At the bottom is a key of the various oligopeptide units recognizable in hagfish TBP, indicating the number of repeats of each unit seen in human and hagfish TBPs.(24) Due to mutation and genetic drift, some repeat units are less obvious in higher vertebrates.(24) The two regions having no recognizable pattern (*) are dissimilar to each other and are not obviously derived by duplication and deletion of adjacent sequences; however, like the rest of the N terminus, these regions might have arisen by insertion/deletion of sequences whose ancestry is not recognized. Q repeat lengths are given as number of Q-codons (**) because, even though this region can expand and contract as larger multi-codon units, this has only been documented once.(48) Normal human tbp genes have 36–38 Qs in the repeat region; however patients with SCA17 neuropathologies have severely expanded TBP Q-domains.(48,49)

Figure 3
Figure 3

Alignment of predicted amino acid sequences of the alpha-2 domain of two bison MHC-I alleles. Allele Bibi-N*00501 is a novel allele with a seven amino acid duplication following position 163 of the parent allele, Bibi-N*00502. Numbering corresponds to amino acid positions of normal bovine or bison MHC-I proteins starting from the first amino acid of the mature protein. Dots in the Bibi-N*00501 sequence indicate amino acid identity with Bibi-N*00502. The donor sequence is highlighted in green and the duplicated peptide in allele Bibi-N*00501 is highlighted in blue. The symbols above the Bibi-N*00502 sequence indicate amino acids of typical MHC-I proteins (e.g. Bibi-N*00502) that are predicted to contact either the bound oligopeptide or the T-cell receptor: * predicted to contact the peptide, + predicted to contact the T-cell ∼ receptor, predicted to contact both.(55) The seven amino acid duplication in Bibi-N*00501 is in the middle of the α-helix that contacts with both the peptide antigen and the T-cell receptor.

Similar articles

Cited by

References

    1. Gilbert W. Why genes in pieces? Nature. 1978;271:501. - PubMed
    1. Greer JM, Puetz J, Thomas KR, Capecchi MR. Maintenance of functional equivalence during paralogous Hox gene evolution. Nature. 2000;403:661–665. - PubMed
    1. Li WH, Gu Z, Cavalcanti AR, Nekrutenko A. Detection of gene duplications and block duplications in eukaryotic genomes. J Struct Funct Genomics. 2003;3:27–34. - PubMed
    1. Hoegg S, Meyer A. Hox clusters as models for vertebrate genome evolution. Trends Genet. 2005;21:421–424. - PubMed
    1. de Souza SJ, Long M, Gilbert W. Introns and gene evolution. Genes Cells. 1996;1:493–505. - PubMed

Publication types

MeSH terms

Substances