Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation - PubMed
. 2014 Dec;32(12):1262-7.
doi: 10.1038/nbt.3026. Epub 2014 Sep 3.
Affiliations
- PMID: 25184501
- PMCID: PMC4262738
- DOI: 10.1038/nbt.3026
Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation
John G Doench et al. Nat Biotechnol. 2014 Dec.
Abstract
Components of the prokaryotic clustered, regularly interspaced, short palindromic repeats (CRISPR) loci have recently been repurposed for use in mammalian cells. The CRISPR-associated (Cas)9 can be programmed with a single guide RNA (sgRNA) to generate site-specific DNA breaks, but there are few known rules governing on-target efficacy of this system. We created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. We discovered sequence features that improved activity, including a further optimization of the protospacer-adjacent motif (PAM) of Streptococcus pyogenes Cas9. The results from 1,841 sgRNAs were used to construct a predictive model of sgRNA activity to improve sgRNA design for gene editing and genetic screens. We provide an online tool for the design of highly active sgRNAs for any gene of interest.
Figures

sgRNA activity screens in mouse and human cells. (a) Representation of the sgRNA libraries. Colors represent genes assayed by FACS; light gray indicates genes either poorly expressed or not assayed; dark gray indicates targets not found in the mouse or human genomes. (b) Top: Antibody staining in cells (red) compared to unstained cells (black). Bottom: FACS plots indicating the negative population isolated for each cell surface marker after library transduction. (c) Percent of sgRNAs enriched >10-fold (mouse) or >2-fold (human) in the marker-negative population that were on-target.

Features of sgRNA activity. (a) sgRNA concordance across cell lines. Pairwise comparison between cell lines of sgRNA percent-rank (see Methods for percent-rank calculation) for sgRNAs targeting CD13 or CD33; Spearman rank correlation of 0.87 and 0.80, respectively. (b) Activity maps of sgRNA by cut site position. Exons and 100 nts of flanking intron are represented as lines on the x-axis with gaps marking the remaining intronic sequence. sgRNAs excluded from activity modeling are indicated in gray. Boundary sgRNAs (green) are those where the cut site, between nts 17 and 18, falls between annotated regions (e.g. CDS/intron). All sgRNAs with fold enrichment ≤ 0.25 are grouped at the bottom of the y-axis. Scale bar indicates 500nt of sequence. (c) Activity as a function of G/C content for the 1,841 CDS-targeting sgRNAs analyzed. The top, middle and bottom lines of the box represent the 25th, 50th, and 75th percentiles, respectively; the whiskers represent the 10th and 90th percentiles. p* = 0.0003, p** = 3 × 10−11, Kolmogorov-Smirnov test.

Model of sgRNA activity. (a) p-values of observing the conditional probability of a guide with a percent-rank activity of >0.8 under the null distribution examined at every position including the 4 nt upstream of the sgRNA target site, the 20 nt of sgRNA complementarity, the PAM, and the 3 nt downstream of the sgRNA target sequence. p-values were calculated from the binomial distribution with a baseline probability of 0.2 using 1,841 CDS-targeting guides. (b) Performance evaluation of sgRNA activity prediction scores based on nucleotide features. Scores for 1,841 sgRNAs are divided by quintile (x-axis) and experimentally-determined activity within each prediction group is assessed by sgRNA percent rank, and also binned by quintile (y-axis). (c) Performance validation of sgRNA prediction algorithm. The model was trained on all possible combinations of 8 genes and tested individually on the remaining held-out gene. Each gray line indicates the ROC curve for a held-out gene. The black line is the mean ROC curve. The bar graph inset indicates the Area Under the Curve (AUC) for each gene. (d) Distribution of 1,841 sgRNAs across predicted score quintiles. (e) Simulation of the fraction of most-active sgRNAs, arbitrarily defined as the top 20% of sgRNA for a gene, in hypothetical libraries with 6 sgRNAs per gene. For a library designed with no on-target criteria (null, in red) the values are simply the binominal expansion of 0.2. For the hypothetical library that incorporates sgRNA scoring rules to enrich for highly-active sgRNAs (blue), the model predicts that the top two quintiles of scores (0.6 – 1.0) contain 66.3% of most-active sgRNAs, and thus the values are the binomial expansion of 0.663.
Similar articles
-
Increasing the efficiency of CRISPR-Cas9-VQR precise genome editing in rice.
Hu X, Meng X, Liu Q, Li J, Wang K. Hu X, et al. Plant Biotechnol J. 2018 Jan;16(1):292-297. doi: 10.1111/pbi.12771. Epub 2017 Aug 5. Plant Biotechnol J. 2018. PMID: 28605576 Free PMC article.
-
Wang X, Zhou J, Cao C, Huang J, Hai T, Wang Y, Zheng Q, Zhang H, Qin G, Miao X, Wang H, Cao S, Zhou Q, Zhao J. Wang X, et al. Sci Rep. 2015 Aug 21;5:13348. doi: 10.1038/srep13348. Sci Rep. 2015. PMID: 26293209 Free PMC article.
-
Increasing the specificity of CRISPR systems with engineered RNA secondary structures.
Kocak DD, Josephs EA, Bhandarkar V, Adkar SS, Kwon JB, Gersbach CA. Kocak DD, et al. Nat Biotechnol. 2019 Jun;37(6):657-666. doi: 10.1038/s41587-019-0095-1. Epub 2019 Apr 15. Nat Biotechnol. 2019. PMID: 30988504 Free PMC article.
-
Cas9, Cpf1 and C2c1/2/3-What's next?
Nakade S, Yamamoto T, Sakuma T. Nakade S, et al. Bioengineered. 2017 May 4;8(3):265-273. doi: 10.1080/21655979.2017.1282018. Epub 2017 Jan 31. Bioengineered. 2017. PMID: 28140746 Free PMC article. Review.
-
CRISPR-Cas in Streptococcus pyogenes.
Le Rhun A, Escalera-Maurer A, Bratovič M, Charpentier E. Le Rhun A, et al. RNA Biol. 2019 Apr;16(4):380-389. doi: 10.1080/15476286.2019.1582974. Epub 2019 Mar 11. RNA Biol. 2019. PMID: 30856357 Free PMC article. Review.
Cited by
-
Choosing the Right Tool for the Job: RNAi, TALEN, or CRISPR.
Boettcher M, McManus MT. Boettcher M, et al. Mol Cell. 2015 May 21;58(4):575-85. doi: 10.1016/j.molcel.2015.04.028. Mol Cell. 2015. PMID: 26000843 Free PMC article. Review.
-
Sapozhnikov DM, Szyf M. Sapozhnikov DM, et al. Nat Protoc. 2022 Dec;17(12):2840-2881. doi: 10.1038/s41596-022-00741-3. Epub 2022 Oct 7. Nat Protoc. 2022. PMID: 36207463 Review.
-
Chu VT, Graf R, Wirtz T, Weber T, Favret J, Li X, Petsch K, Tran NT, Sieweke MH, Berek C, Kühn R, Rajewsky K. Chu VT, et al. Proc Natl Acad Sci U S A. 2016 Nov 1;113(44):12514-12519. doi: 10.1073/pnas.1613884113. Epub 2016 Oct 11. Proc Natl Acad Sci U S A. 2016. PMID: 27729526 Free PMC article.
-
Lupish B, Hall J, Schwartz C, Ramesh A, Morrison C, Wheeldon I. Lupish B, et al. Biotechnol Bioeng. 2022 Dec;119(12):3623-3631. doi: 10.1002/bit.28219. Epub 2022 Sep 22. Biotechnol Bioeng. 2022. PMID: 36042688 Free PMC article.
-
Harnessing the evolving CRISPR/Cas9 for precision oncology.
Li T, Li S, Kang Y, Zhou J, Yi M. Li T, et al. J Transl Med. 2024 Aug 8;22(1):749. doi: 10.1186/s12967-024-05570-4. J Transl Med. 2024. PMID: 39118151 Free PMC article. Review.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous