Exploring the sequence fitness landscape of a bridge between protein folds - PubMed
- ️Wed Jan 01 2020
Exploring the sequence fitness landscape of a bridge between protein folds
Pengfei Tian et al. PLoS Comput Biol. 2020.
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures

Sequence-based models for the GA and GB domains of streptococcal protein G. Many sequences (A) fold to each structure (B): e.g. structures of three naturally occurring sequences with the GA fold (pdb ID 2fs1, 1gjs and 2j5y) and three with the GB fold (pdb ID 1pga, 2lum and 1igd) are shown on the left and right respectively. Contacts between pairs of residues in the native structure (Cβ atoms of example pairs in yellow) impose mutual constraints on the types of residues which can occupy these positions in the sequence alignment. For instance, strong covariance is detected between the amino acids at residue 21 and 30 for GA sequences and between residues 46 and 51 for GB sequences. The Cβ atoms of these residues are illustrated in yellow sphere. The UniProtKB ID of these example sequences for GA are Q51918_FINMA, G5KGV3_9STRE, G5K7M6_9STRE and Q56192_STAXY. And the ones for GB are SPG1_STRSG, E4KPW8_9LACT, F9P4J6_STRCV and G5JZF8_9STRE. (C) Simple model for the emergence of new folds via evolutionary drift in sequence space between basins of attraction corresponding to the GA and GB domains.

(A) Distribution of EGA for the GA homologs used to parameterize EGA (cyan), synthetic sequences which are dominated by GA fold (blue) state in equilibrium, unstable synthetic sequences (yellow) and randomly generated sequences (grey). (B) Distribution of EGB for the GB homologs used to parameterize EGB (purple), synthetic sequences which are dominated by GB fold (red) state in equilibrium, unstable synthetic sequences (yellow) and random sequences (grey). (C) The correlation between the folding temperature (Tm) and EGA for synthetic sequences of GA. Stable mutants are blue symbols, unstable are yellow symbols with Tm set to 20°C for plotting purposes. (D) The correlation between Tm and EGB for experimental mutants of GB (stable: red, unstable: yellow, Tm set to 20°C).

(A) The committor for reaching the GA fold, ϕA is plotted for the experimentally characterized mutant sequences with blue (GA fold) and red (GB fold) symbols. The mean and standard deviation of ϕA for an equilibrium sample of sequences at given values of the optimized coordinate EA-Bopt are shown by black symbols and errorbars. The theoretical committor from a 1D diffusion model is shown in yellow. (B) The ϕA values (colours) are projected onto EGA and EGB for each sequence. Purple and blue broken lines are perpendicular to the original coordinate EA − B = EGA − EGB and the optimized coordinate EA-Bopt=λEGA-EGB respectively (λ = 1.13). (C) Free energy profile of the combined model for the natural mutations (blue), natural mutations with stability constraints (green) and the binary mutations (red). The free energy (in sequence space) was estimated using the weighted histogram analysis method, based on umbrella sampling on the coordinate EA-Bopt. (D) The profile of position-dependent diffusion coefficients for the natural mutations (blue) and the binary mutations (red).

(A) Potential energy landscape of the combined model. (B) Contribution of entropy to free energy. (C) 2D free energy landscape of the fold switch for natural mutation simulations. (D) Example of three transition paths from GA basin to the GB basin. Examples of transition paths (E) with stability constraints (shaded and crossed box represents forbidden region where one or both folds is predicted to be unstable), and (F) using only “binary” mutations. The free energy surface in (F) is the one in which only binary mutations are allowed. All energies are in kBT.

In each plot, the dashed line represents the boundary between the two subsets described by Uversky [80]. We calculated the q, h of 10000 randomly selected transition sequences, defined as having ϕA within [0.49,0.51], from the simulations (A) without and (B) with stability constraints (one symbol for each sequence; probability density contours containing 10, 40 and 70% of the data are also shown). The q, h of 694 known IDPs from the DisProt database [81] and 7957 globular proteins from the Top8000 database [82] are shown in (C) and (D) respectively. Sequences of GA and GB wild-type are shown with cyan and purple stars, respectively, in (B). (E) and (F) are respectively heat map and contour map representations of the IDP propensity P(IDP|q, h). The legends (%) represent the probability of being an IDP P(IDP|q, h) for each (q, h) combination.

(A) Examples of hA-B for residues 9, 24, 27, 43. (B) Total change (d) of hA-B from GA to GB. (C) Slope (K) at the transition region where it corresponds to ϕA ∈ [0.2,0.8]. The δ (D) and its correlation with d and K are show in (E) and (F).
Similar articles
-
Co-Evolutionary Fitness Landscapes for Sequence Design.
Tian P, Louis JM, Baber JL, Aniana A, Best RB. Tian P, et al. Angew Chem Int Ed Engl. 2018 May 14;57(20):5674-5678. doi: 10.1002/anie.201713220. Epub 2018 Mar 25. Angew Chem Int Ed Engl. 2018. PMID: 29512300 Free PMC article.
-
Smooth functional transition along a mutational pathway with an abrupt protein fold switch.
Holzgräfe C, Wallin S. Holzgräfe C, et al. Biophys J. 2014 Sep 2;107(5):1217-1225. doi: 10.1016/j.bpj.2014.07.020. Biophys J. 2014. PMID: 25185557 Free PMC article.
-
Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches.
Sikosek T, Krobath H, Chan HS. Sikosek T, et al. PLoS Comput Biol. 2016 Jun 2;12(6):e1004960. doi: 10.1371/journal.pcbi.1004960. eCollection 2016 Jun. PLoS Comput Biol. 2016. PMID: 27253392 Free PMC article.
-
Lupas AN, Ponting CP, Russell RB. Lupas AN, et al. J Struct Biol. 2001 May-Jun;134(2-3):191-203. doi: 10.1006/jsbi.2001.4393. J Struct Biol. 2001. PMID: 11551179 Review.
Cited by
-
A high-throughput predictive method for sequence-similar fold switchers.
Kim AK, Looger LL, Porter LL. Kim AK, et al. Biopolymers. 2021 Oct;112(10):e23416. doi: 10.1002/bip.23416. Epub 2021 Jan 19. Biopolymers. 2021. PMID: 33462801 Free PMC article.
-
Galaz-Davison P, Ferreiro DU, Ramírez-Sarmiento CA. Galaz-Davison P, et al. Protein Sci. 2022 Jun;31(6):e4337. doi: 10.1002/pro.4337. Protein Sci. 2022. PMID: 35634768 Free PMC article.
-
Identification of a covert evolutionary pathway between two protein folds.
Chakravarty D, Sreenivasan S, Swint-Kruse L, Porter LL. Chakravarty D, et al. Nat Commun. 2023 Jun 1;14(1):3177. doi: 10.1038/s41467-023-38519-0. Nat Commun. 2023. PMID: 37264049 Free PMC article.
-
A systematic analysis of regression models for protein engineering.
Michael R, Kæstel-Hansen J, Mørch Groth P, Bartels S, Salomon J, Tian P, Hatzakis NS, Boomsma W. Michael R, et al. PLoS Comput Biol. 2024 May 3;20(5):e1012061. doi: 10.1371/journal.pcbi.1012061. eCollection 2024 May. PLoS Comput Biol. 2024. PMID: 38701099 Free PMC article.
-
Mishra S, Looger LL, Porter LL. Mishra S, et al. Biopolymers. 2021 Oct;112(10):e23471. doi: 10.1002/bip.23471. Epub 2021 Sep 9. Biopolymers. 2021. PMID: 34498740 Free PMC article.
References
-
- Pruitt K, Brown G, Tatusova T, Maglott D. The NCBI Handbook. Bethesda, MD: National Center for Biotechnology Information; 2012.
Publication types
MeSH terms
Substances
Grants and funding
PT and RB were supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
LinkOut - more resources
Full Text Sources