Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm - PubMed
Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm
Roger A Craig et al. Nucleic Acids Res. 2010 Jan.
Abstract
Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.
Figures
![Figure 1.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/e6291c0ff824/gkp906f1.gif)
Schematic illustration of point mutation and cross-over for generating new distributions. The top panel shows a point mutation from ‘0’ to ‘1’ at the third position of the string ‘0101 1011 0011 0001’, which encodes the amounts of nucleotides (A = 5, C = 11, G = 3 and T = 1), translating to a distribution (A = 25%, C = 55%, G = 15% and T = 5%) after normalization. A cross-over is shown for the bottom two strings in the left column, with a pivot point at the middle of each string, leading to two new strings in the right column, each is composed of two substrings, respectively, from the two original strings splitting at the pivot point, as indicated by their corresponding colors.
![Figure 2.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/787165455a57/gkp906f2.gif)
R-values for calculated distributions achieved by various objective functions with one, two test tube and the method of Wang and Saven (5).
![Figure 3.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/2f1cb936bccb/gkp906f3.gif)
Site 28 of SH3 domain with one and two codon distributions using the RE + LS objective function.
![Figure 4.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/59dd19c5289b/gkp906f4.gif)
Single codon nucleotide probability distribution for site 28 of SH3 domain using the RE + LS objective function.
![Figure 5.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/f39ebdd4c666/gkp906f5.gif)
Nucleotide probability distributions for two weighted codons for site 28 of SH3 domain generated using the RE + LS objective function. The first and second codon nucleotide distributions are weighted 0.29 and 0.71, respectively.
![Figure 6.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/a8e9ce0ec12b/gkp906f6.gif)
Site 54 of SH3 domain with one and two codon distributions using the RE + LS objective function.
![Figure 7.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/6eafb27a8d37/gkp906f7.gif)
Single codon nucleotide probability distribution for site 54 of SH3 domain using the RE + LS objective function.
![Figure 8.](https://cdn.ncbi.nlm.nih.gov/pmc/blobs/234f/2811015/3f8973506dc8/gkp906f8.gif)
Nucleotide probability distributions for two weighted codons for site 54 of SH3 domain generated using the RE + LS objective function. The first and second codon nucleotide distributions are weighted 0.25 and 0.75, respectively.
Similar articles
-
Designing gene libraries from protein profiles for combinatorial protein experiments.
Wang W, Saven JG. Wang W, et al. Nucleic Acids Res. 2002 Nov 1;30(21):e120. doi: 10.1093/nar/gnf119. Nucleic Acids Res. 2002. PMID: 12409479 Free PMC article.
-
Codon compression algorithms for saturation mutagenesis.
Pines G, Pines A, Garst AD, Zeitoun RI, Lynch SA, Gill RT. Pines G, et al. ACS Synth Biol. 2015 May 15;4(5):604-14. doi: 10.1021/sb500282v. Epub 2014 Oct 30. ACS Synth Biol. 2015. PMID: 25303315
-
Wolf E, Kim PS. Wolf E, et al. Protein Sci. 1999 Mar;8(3):680-8. doi: 10.1110/ps.8.3.680. Protein Sci. 1999. PMID: 10091671 Free PMC article.
-
Saven JG. Saven JG. Curr Opin Struct Biol. 2002 Aug;12(4):453-8. doi: 10.1016/s0959-440x(02)00347-0. Curr Opin Struct Biol. 2002. PMID: 12163067 Review.
-
Hatfield GW, Roth DA. Hatfield GW, et al. Biotechnol Annu Rev. 2007;13:27-42. doi: 10.1016/S1387-2656(07)13002-7. Biotechnol Annu Rev. 2007. PMID: 17875472 Review.
Cited by
-
SwiftLib: rapid degenerate-codon-library optimization through dynamic programming.
Jacobs TM, Yumerefendi H, Kuhlman B, Leaver-Fay A. Jacobs TM, et al. Nucleic Acids Res. 2015 Mar 11;43(5):e34. doi: 10.1093/nar/gku1323. Epub 2014 Dec 24. Nucleic Acids Res. 2015. PMID: 25539925 Free PMC article.
-
CoLiDe: Combinatorial Library Design tool for probing protein sequence space.
Tretyachenko V, Voráček V, Souček R, Fujishima K, Hlouchová K. Tretyachenko V, et al. Bioinformatics. 2021 May 1;37(4):482-489. doi: 10.1093/bioinformatics/btaa804. Bioinformatics. 2021. PMID: 32956450 Free PMC article.
References
-
- Boder ET, Wittrup KD. Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol. 1997;15:553–557. - PubMed
-
- Hoess RH. Protein design and phage display. Chem. Rev. 2001;101:3205–3218. - PubMed
-
- Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht M. Protein design by binary patterning of polar and nonpolar amino acids. Science. 1993;262:1680–1684. - PubMed
-
- Balint RF, Larrick JW. Antibody engineering by parsimonious mutagenesis. Gene. 1993;137:109–118. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous