Amino-Acid Characteristics in Protein Native State Structures - PubMed
- ️Mon Jan 01 2024
Amino-Acid Characteristics in Protein Native State Structures
Tatjana Škrbić et al. Biomolecules. 2024.
Abstract
The molecular machines of life, proteins, are made up of twenty kinds of amino acids, each with distinctive side chains. We present a geometrical analysis of the protrusion statistics of side chains in more than 4000 high-resolution protein structures. We employ a coarse-grained representation of the protein backbone viewed as a linear chain of Cα atoms and consider just the heavy atoms of the side chains. We study the large variety of behaviors of the amino acids based on both rudimentary structural chemistry as well as geometry. Our geometrical analysis uses a backbone Frenet coordinate system for the common study of all amino acids. Our analysis underscores the richness of the repertoire of amino acids that is available to nature to design protein sequences that fit within the putative native state folds.
Keywords: amino-acid classes; local Frenet frame; pre-sculpted landscape; side-chain protrusion.
Conflict of interest statement
The authors declare no conflict of interest.
Figures

Two-dimensional projections of the mean maximal protrusion of nineteen amino acids in more than 4000 high-resolution structures of globular proteins. For ease of visualization, we show three two-dimensional views: (a) in the anti-normal–binormal plane; (b) in the anti-normal–tangent plane; and (c) in the binormal–tangent plane. The color code of the protrusion vectors follows that employed in Table 2. The black X symbols in all the three panels denote the end point of the projection of the mean protrusion vector calculated for all amino acids in our data set into the corresponding plane.

Native state of bacteriophage T4 lysozyme (PDB code: 2LZM) in the CPK representation [23,24] in which all heavy atoms of the protein backbone and its side chains are represented as spheres with radii proportional to their respective van der Waals atomic radii. Color code: carbon (cyan), oxygen (red), nitrogen (blue), and sulfur (yellow). The side chains in the protein interior are very well packed.

Local Frenet frame of the amino acid i. The three consecutive Cα atoms are at points i − 1, i, and i + 1 and lie in the plane of the paper. The point O is at the center of a circle passing through them. Please see text for a description of the orthonormal basis set.

(a) Probability distribution of the projections (cos θ values) of the maximally protruding directions of amino-acid side chains along the anti-normal directions of their respective local Frenet frames of ~900,000 non-glycine residues in more than 4000 high-resolution structures of globular proteins. (b) Probability distribution of the cos θ values for the three subsets of all consecutive triplets of Cα atoms belonging to ‘α’-helical segments (red histogram), to ‘β’-strands (blue histogram), and those for which the consecutive triplets of Cα atoms are in protein loops.

Gallery of nineteen amino acids (with glycine excluded). Three-letter amino acid codes are used. For each amino acid, the maximally protruding atom along with the frequency with which it occurs is shown. The color code of the atoms is: carbon Cα in green, carbon C atoms other than Cα in turquoise, oxygen O atoms in red, nitrogen N atoms in dark blue, and sulfur S atoms in yellow. Carbon Cα atoms (green spheres) are artificially represented as spheres with slightly larger radius than the rest of C atoms (cyan spheres) to enhance visibility. The measure of the degree of protrusion of a given side chain atom with respect to the backbone was defined to be the distance of the atom from the corresponding Cα atom. The color code of the amino-acid labels follows that in Table 2. We note that here we have adopted atom names as assigned in the PDB file, and this makes the branching numbers assigned for identical atoms spurious. NH1 and NH2 atoms in LYS; OE1 and OE2 atoms in GLU; and OD1 and OD2 atoms in ASP are indistinguishable. Nevertheless, we follow the atom nomenclature of the PDB files.

Histogram of the maximal protrusion Rmax of amino acids in more than 4000 high-resolution structures of globular proteins. The 19 amino acids (with glycine being excluded, having no heavy side chain atoms) are denoted with a three-letter amino acid code and are colored according to the amino acid classification summarized in Table 2. The mean values of Rmax for each of the amino acids are shown as black X symbols, while the colored rectangles have a width that corresponds to the standard deviation.

Sketches of the histograms of Rmax and conformations associated with the multiple modes for six amino acids: (a) ILE; (b) TRP; (c) LYS; (d) HIS; (e) GLN; and (f) MET. For each set of rotamers, the Cα and Cβ atoms are superimposed to better visualize the distinction between the conformations. The arrows link the maximally protruding atom to the corresponding mode in the Rmax frequency distribution. The atoms are color coded: carbon Cα in green, carbon C atoms other than Cα in turquoise, oxygen O atoms in red, nitrogen N atoms in blue, and sulfur S atoms in yellow.

Side and top views of the folds adopted by highly similar amino acid sequences shown in Table 4. The GA sequences adopt the topology of a three-helix bundle (3α-fold), while the GB sequence adopts a α+4β fold. In all panels, the pink ribbons denote the portions of a chain that adopt the α-helical conformation, while the yellow ribbons form β-strands. Parts of a backbone that are not part of the secondary structure are shown in light gray. The darker gray spheres represent the positions of Cα atoms, whose radius is only 30% of the van der Waals radius of C atom, for ease of visibility. On the other hand, the heavy side chain atoms of the key amino acids responsible for changes in protein function and stability are assigned the van der Waals radii of the constituent atom types. Heavy atoms of ILE residues are shown in blue, LEU in green, TYR in red, and PHE in orange color. Panels (a1,a2) show the side and top views, respectively, of the α+4β topology of Protein G (GB98 sequence). Panels (b1,b2) represent side and top views of a ‘non-existent’ 3α fold for the same sequence as in Panels (a1,a2). Panels (c1,c2) represent the side and top views of the marginally stable GA98 sequence, whereas Panels (d1,d2) show the side and top views of the stable GA95 sequence. This stability is acquired by a single mutation from PHE to ILE at position 30, see Table 4.
Similar articles
-
Anisotropic effective interactions in a coarse-grained tube picture of proteins.
Banavar JR, Maritan A, Seno F. Banavar JR, et al. Proteins. 2002 Nov 1;49(2):246-54. doi: 10.1002/prot.10218. Proteins. 2002. PMID: 12211004
-
Design of a rotamer library for coarse-grained models in protein-folding simulations.
Larriva M, Rey A. Larriva M, et al. J Chem Inf Model. 2014 Jan 27;54(1):302-13. doi: 10.1021/ci4005833. Epub 2013 Dec 31. J Chem Inf Model. 2014. PMID: 24354725
-
Salwiczek M, Nyakatura EK, Gerling UI, Ye S, Koksch B. Salwiczek M, et al. Chem Soc Rev. 2012 Mar 21;41(6):2135-71. doi: 10.1039/c1cs15241f. Epub 2011 Nov 30. Chem Soc Rev. 2012. PMID: 22130572 Review.
-
Amino acid network for the discrimination of native protein structures from decoys.
Zhou J, Yan W, Hu G, Shen B. Zhou J, et al. Curr Protein Pept Sci. 2014;15(6):522-8. doi: 10.2174/1389203715666140724084709. Curr Protein Pept Sci. 2014. PMID: 25059328 Review.
References
-
- Creighton T.E. Proteins: Structures and Molecular Properties. W. H. Freeman; New York, NY, USA: 1993.
-
- Lesk A.M. Introduction to Protein Science: Architecture, Function and Genomics. Oxford University Press; Oxford, UK: 2004.
-
- Bahar I., Jernigan R.L., Dill K.A. Protein Actions. Garland Science; New York, NY, USA: 2017.
-
- Berg J.M., Tymoczko J.L., Gatto G.J., Jr., Stryer L. Biochemistry. Macmillan Learning; New York, NY, USA: 2019.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources