pubmed.ncbi.nlm.nih.gov

Exploring the sequence fitness landscape of a bridge between protein folds - PubMed

️Wed Jan 01 2020

Exploring the sequence fitness landscape of a bridge between protein folds

Pengfei Tian et al. PLoS Comput Biol. 2020.

Abstract

Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1**
Sequence-based models for the GA and GB domains of streptococcal protein G. Many sequences (A) fold to each structure (B): e.g. structures of three naturally occurring sequences with the GA fold (pdb ID 2fs1, 1gjs and 2j5y) and three with the GB fold (pdb ID 1pga, 2lum and 1igd) are shown on the left and right respectively. Contacts between pairs of residues in the native structure (Cβ atoms of example pairs in yellow) impose mutual constraints on the types of residues which can occupy these positions in the sequence alignment. For instance, strong covariance is detected between the amino acids at residue 21 and 30 for GA sequences and between residues 46 and 51 for GB sequences. The Cβ atoms of these residues are illustrated in yellow sphere. The UniProtKB ID of these example sequences for GA are Q51918_FINMA, G5KGV3_9STRE, G5K7M6_9STRE and Q56192_STAXY. And the ones for GB are SPG1_STRSG, E4KPW8_9LACT, F9P4J6_STRCV and G5JZF8_9STRE. (C) Simple model for the emergence of new folds via evolutionary drift in sequence space between basins of attraction corresponding to the GA and GB domains.

**Fig 2. Properties of the single-fold models.**
(A) Distribution of E_GA for the GA homologs used to parameterize E_GA (cyan), synthetic sequences which are dominated by GA fold (blue) state in equilibrium, unstable synthetic sequences (yellow) and randomly generated sequences (grey). (B) Distribution of E_GB for the GB homologs used to parameterize E_GB (purple), synthetic sequences which are dominated by GB fold (red) state in equilibrium, unstable synthetic sequences (yellow) and random sequences (grey). (C) The correlation between the folding temperature (T_m) and E_GA for synthetic sequences of GA. Stable mutants are blue symbols, unstable are yellow symbols with T_m set to 20°C for plotting purposes. (D) The correlation between T_m and E_GB for experimental mutants of GB (stable: red, unstable: yellow, T_m set to 20°C).

**Fig 3. One-dimensional energy landscape capturing fold switch.**
(A) The committor for reaching the GA fold, ϕ_A is plotted for the experimentally characterized mutant sequences with blue (GA fold) and red (GB fold) symbols. The mean and standard deviation of ϕ_A for an equilibrium sample of sequences at given values of the optimized coordinate EA-Bopt are shown by black symbols and errorbars. The theoretical committor from a 1D diffusion model is shown in yellow. (B) The ϕ_A values (colours) are projected onto E_GA and E_GB for each sequence. Purple and blue broken lines are perpendicular to the original coordinate E_{A − B} = E_GA − E_GB and the optimized coordinate EA-Bopt=λEGA-EGB respectively (λ = 1.13). (C) Free energy profile of the combined model for the natural mutations (blue), natural mutations with stability constraints (green) and the binary mutations (red). The free energy (in sequence space) was estimated using the weighted histogram analysis method, based on umbrella sampling on the coordinate EA-Bopt. (D) The profile of position-dependent diffusion coefficients for the natural mutations (blue) and the binary mutations (red).

**Fig 4. Fitness landscape.**
(A) Potential energy landscape of the combined model. (B) Contribution of entropy to free energy. (C) 2D free energy landscape of the fold switch for natural mutation simulations. (D) Example of three transition paths from GA basin to the GB basin. Examples of transition paths (E) with stability constraints (shaded and crossed box represents forbidden region where one or both folds is predicted to be unstable), and (F) using only “binary” mutations. The free energy surface in (F) is the one in which only binary mutations are allowed. All energies are in k_BT.

**Fig 5. The Uversky plot divides proteins into folded globular and intrinsically disordered proteins based on their mean net change (q) and the mean hydrophobicity (h) [80].**
In each plot, the dashed line represents the boundary between the two subsets described by Uversky [80]. We calculated the q, h of 10000 randomly selected transition sequences, defined as having ϕ_A within [0.49,0.51], from the simulations (A) without and (B) with stability constraints (one symbol for each sequence; probability density contours containing 10, 40 and 70% of the data are also shown). The q, h of 694 known IDPs from the DisProt database [81] and 7957 globular proteins from the Top8000 database [82] are shown in (C) and (D) respectively. Sequences of GA and GB wild-type are shown with cyan and purple stars, respectively, in (B). (E) and (F) are respectively heat map and contour map representations of the IDP propensity P(IDP|q, h). The legends (%) represent the probability of being an IDP P(IDP|q, h) for each (q, h) combination.

**Fig 6. Single-site amino acid propensity changes in fold switching.**
(A) Examples of h^A-B for residues 9, 24, 27, 43. (B) Total change (d) of h^A-B from GA to GB. (C) Slope (K) at the transition region where it corresponds to ϕ_A ∈ [0.2,0.8]. The δ (D) and its correlation with d and K are show in (E) and (F).

Cited by

A high-throughput predictive method for sequence-similar fold switchers.
Kim AK, Looger LL, Porter LL. Kim AK, et al. Biopolymers. 2021 Oct;112(10):e23416. doi: 10.1002/bip.23416. Epub 2021 Jan 19. Biopolymers. 2021. PMID: 33462801 Free PMC article.
Coevolution-derived native and non-native contacts determine the emergence of a novel fold in a universally conserved family of transcription factors.
Galaz-Davison P, Ferreiro DU, Ramírez-Sarmiento CA. Galaz-Davison P, et al. Protein Sci. 2022 Jun;31(6):e4337. doi: 10.1002/pro.4337. Protein Sci. 2022. PMID: 35634768 Free PMC article.
Identification of a covert evolutionary pathway between two protein folds.
Chakravarty D, Sreenivasan S, Swint-Kruse L, Porter LL. Chakravarty D, et al. Nat Commun. 2023 Jun 1;14(1):3177. doi: 10.1038/s41467-023-38519-0. Nat Commun. 2023. PMID: 37264049 Free PMC article.
A systematic analysis of regression models for protein engineering.
Michael R, Kæstel-Hansen J, Mørch Groth P, Bartels S, Salomon J, Tian P, Hatzakis NS, Boomsma W. Michael R, et al. PLoS Comput Biol. 2024 May 3;20(5):e1012061. doi: 10.1371/journal.pcbi.1012061. eCollection 2024 May. PLoS Comput Biol. 2024. PMID: 38701099 Free PMC article.
A sequence-based method for predicting extant fold switchers that undergo α-helix ↔ β-strand transitions.
Mishra S, Looger LL, Porter LL. Mishra S, et al. Biopolymers. 2021 Oct;112(10):e23471. doi: 10.1002/bip.23471. Epub 2021 Sep 9. Biopolymers. 2021. PMID: 34498740 Free PMC article.

References

1. Pruitt K, Brown G, Tatusova T, Maglott D. The NCBI Handbook. Bethesda, MD: National Center for Biotechnology Information; 2012.
1. Gsponer J, Futschik ME, Teichmann SA, Babu MM. Tight regulation of unstructured proteins: from transcript synthesis to protein degradation. Science. 2008;322:1365–1368. 10.1126/science.1163581 - DOI - PMC - PubMed
1. van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, et al. Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114:6589–6631. 10.1021/cr400525m - DOI - PMC - PubMed
1. Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 2004;14:208–216. 10.1016/j.sbi.2004.03.011 - DOI - PubMed
1. Chothia C. One thousand families for the molecular biologist. Nature. 1992;357:543–544. 10.1038/357543a0 - DOI - PubMed

Publication types

MeSH terms

Substances

Grants and funding

PT and RB were supported by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources

Exploring the sequence fitness landscape of a bridge between protein folds - PubMed