Recursive splicing in long vertebrate genes - PubMed
- ️Thu Jan 01 2015
. 2015 May 21;521(7552):371-375.
doi: 10.1038/nature14466. Epub 2015 May 13.
Warren Emmett # 3 , Lorea Blazquez 1 , Ana Faro 4 , Nejc Haberman 1 , Michael Briese 2 5 , Daniah Trabzuni 1 6 , Mina Ryten 1 7 , Michael E Weale 8 , John Hardy 1 , Miha Modic 2 9 , Tomaž Curk 10 , Stephen W Wilson 4 , Vincent Plagnol 3 , Jernej Ule 1 2
Affiliations
- PMID: 25970246
- PMCID: PMC4471124
- DOI: 10.1038/nature14466
Recursive splicing in long vertebrate genes
Christopher R Sibley et al. Nature. 2015.
Abstract
It is generally believed that splicing removes introns as single units from precursor messenger RNA transcripts. However, some long Drosophila melanogaster introns contain a cryptic site, known as a recursive splice site (RS-site), that enables a multi-step process of intron removal termed recursive splicing. The extent to which recursive splicing occurs in other species and its mechanistic basis have not been examined. Here we identify highly conserved RS-sites in genes expressed in the mammalian brain that encode proteins functioning in neuronal development. Moreover, the RS-sites are found in some of the longest introns across vertebrates. We find that vertebrate recursive splicing requires initial definition of an 'RS-exon' that follows the RS-site. The RS-exon is then excluded from the dominant mRNA isoform owing to competition with a reconstituted 5' splice site formed at the RS-site after the first splicing step. Conversely, the RS-exon is included when preceded by cryptic promoters or exons that fail to reconstitute an efficient 5' splice site. Most RS-exons contain a premature stop codon such that their inclusion can decrease mRNA stability. Thus, by establishing a binary splicing switch, RS-sites demarcate different mRNA isoforms emerging from long genes by coupling cryptic elements with inclusion of RS-exons.
Figures

a) GO term analysis of genes >150 kb relative to all human genes. All GO terms are associated with enrichment scores >2. b) Log2-fold gene expression ratios following differential expression sequencing (DESeq) analysis of all human protein-coding genes between the brain and all other tissues. Data are represented as Loess smoothing curves after the genes by their maximum length in kb. Hashed vertical line indicates 150 kb gene length. RNA-seq data was obtained from the GTEX consortium. c) Individual scatterplots used to create panel (Fig. 1b) and representing differential expression sequencing (DESeq) analysis of individual genes within indicated tissues compared to the brain. Red dots indicate genes that contain RS-sites, blue dots indicate dystrophin and black dots indicate titin (two long genes most highly expressed in muscle tissues). Grey dots are all remaining genes. d) Differential expression sequencing (DESeq) analysis of individual gene expression after and before differentiation of C2C12 mouse myoblasts (GSM521256) into myogenic lineage (GSM521259), after or before differentiation of mouse embryonic stem cells (GSM1346027) into motor neurons (GSM1346035), or after or before differentiation of hematopoietic stem cells (GSM992931) into erythroid lineage (GSM992934). Loess smoothing curves are shown after sorting the genes by their maximum length in kb. Hashed vertical line indicates 150 kb gene length.

a) Examples of RNA-seq read density patterns for three genes together with their calculated gradients across the (i) first intron >50 kb and (ii) the average across all other >50 kb long introns within the same gene. Gradients represent the change in summated read count every 5 kb since RNA-seq reads are grouped in 5 kb windows and linear regression performed on resulting histograms. b) Density plot indicating the ratio of gradients of all other >50 kb introns within the same gene: the gradient of the first intron >50 kb. Blue hashed line represents ratio of 1. This would indicate that gradients for long introns within the same gene are comparable and transcription is proceeding at a largely constant rate. c) Schematic of the bioinformatics pipeline used to identify novel junctions. d) Ranking of human 5′ splice site pentamer usage genome-wide. e) Nucleotide usage frequency at human 3′ splice sites genome-wide, and branch-point positioning relative to 3′ splice site genome-wide.

RNA-seq (red) read density patterns and normalized FUS iCLIP (green) cross-link density patterns for the a) OPCML b) ROBO2 c) HS6ST3 d) ANK3 e) CADM2 f) NCAM1 g) PDE4D genes within human brains. RNA-seq reads and normalized FUS iCLIP cross-links are grouped in 5 kb windows. RefSeq introns >150 kb were searched for novel junctions and linear regression performed on all Ensembl introns >50 kb in which novel junctions were located. Gene isoforms displayed are those including introns within which significant junctions were identified. Red novel junctions represent significant improvements in goodness-of-fit in both RNA-seq and FUS regression analysis (p<0.01 in both datasets, F-test). Blue novel junctions contact RS-exons. Grey novel junctions weren’t deemed significant following regression analysis. Zoomed area represents sequence at deep intronic loci surrounding novel junction. Phylo-P conservation track indicates sequence conservation across 46 levels of mammalian evolution.

a) RNA-seq read density patterns for the OPCML gene across 12 different regions of four separate brains. Gene isoform displayed is that which included the long first intron within which a significant novel junction was identified. RNA-seq reads are grouped in 5 kb windows. Dotted arrows indicate location of experimentally derived RS-site.

a) Schematic of primer design used for RT-PCR validation of novel junctions. b-g) RT-PCR analysis of b) CADM2 c) HS6ST3 d) ROBO2 e) PDE4D_1_1 f) PDE4D_1_2 g) PDE4D_2_2 genes around RS-sites using indicated primers. For PDE4D sites, first number after gene name indicates RS-site studied, second number indicates the upstream exon used. See Extended Data Fig. 3g for junctions detected. h) RT-PCR analysis of cadm2a RS-site junction in adult male and female zebrafish embryos, together with an alignment of zebrafish (ZF) cadm2a RS-site to human (HS) CADM2 RS-site. i) Map of consensus splice site location and in-frame termination codons following RS-sites in indicated human genes. Strong consensus splice sites are GTAAG, GTGAG, GTAGG, GTATG. Weak consensus splice sites are GTAAA, GTAAT, GTGGG, GTAAC, GTCAG, GTACG.

Normalized FUS iCLIP read density patterns for the a) Opcml b) Robo2 c) Hs6st3 d) Ank3 e) Cadm1, f) Ncam1 g) Cadm2 h) Pde4d genes within the mouse brain. Normalized FUS iCLIP cross-link sites are grouped in 5 kb windows, and the displayed linear regression lines were computed on resulting histograms. Zoomed area at deep intronic loci represents RS-site sequences conserved from humans to mouse.

a) Number of cassette and constitutive exons starting with motif GURAG. b-d) RT-PCR of CADM2 gene in the frontal cortex using primers indicated in (b) or Fig. 4a. RT-PCR was carried out on one (b) or four (c-d) human brains. In (c), the inclusion of the second RS-exon occurs together with the minor promoter. Two bands are present for both PCR reactions due to the presence of an alternatively spliced exon following the RS-exon. This can result in two distinct long or short isoforms. In (d), the inclusion of the 2nd RS-exon occurs when the 1st RS-exon is included. Schematics in (c-d) represent examined splicing products together with expected length of products. e) RNA-seq read density patterns for the NTM gene and expected human isoforms. RNA-seq reads are grouped in 5 kb windows and linear regression performed on resulting histograms. A cryptic minor promoter/exon detected by RNA-seq is indicated by vertical red line. The annotated RS-exon is indicated by the vertical blue line. Zoomed area represents RS-site sequence at start of the annotated RS-exon. Primers to assess the major and minor promoter products associated with the RS-exon are indicated by coloured arrows. f) RT-PCR of NTM gene around RS-exon using indicated primers. g) RT-PCR analysis of NTM products in which the upstream exon is either derived from the major upstream promoter or the cryptic upstream promoter/exon. RT-PCR was performed in the frontal cortex of three human brains using primer sets indicated by coloured arrows in (e). Schematics represent possible splicing products together with expected length of products. Upper panel assess RS-exon inclusion, lower panel assesses RS-site junction detection.

a) Qiaxcel analysis and quantification of the splicing intermediates of indicated CADM2 splicing reporter products following transfection in SH-SY5Y cells. Primers used are indicated by red arrows in schematic, together with expected products and their sizes. b) RT-PCR analysis of the zebrafish cadm2a mRNA following in vivo injection of AON-2. Sequencing reveals RS-exon inclusion results in subsequent splicing to additional downstream cryptic elements before the second exon, explaining why RS-exon included product size is larger than expected. c) qRT-PCR analysis of exon-exon junctions surrounding the RS-site containing introns following AON-A1 mediated inhibition of RS-site use of the human CADM1 and ANK3 genes (n=3, 1 experiment) or the zebrafish cadm2a gene (n=7, 3 separate experiments). d) Splice site scores of reconstituted 5′ splice sites following first step of recursive splicing vs. the 5′ splice sites of corresponding recursive exons.

a) UCSC annotated isoforms of the OPCML gene together with spliced ESTs detected across the OPCML locus. Recursive exon is marked in blue, and the preceding exons produced by minor promoter or cryptic splicing of the long first intron are marked in red. b) Lengths of the 9 introns containing the high-confidence RS-sites compared to other introns across vertebrates. Results are an extension of Fig. 4g. c) Boxplot showing the detected number of un-annotated alternative start exons which junction to the dominant second exon of brain expressed genes. Only novel junctions which do not match UCSC/GENCODE transcripts are considered for analysis. Genes are separated into bins based on the first intron length of the canonical isoform. Boxplot presents median, first and third quartile boundaries for each bin. Additional red diamonds indicate mean values for each bin. * represents significance in Mann-Whitney U tests, with significance set at p<10−10. Only tests between the 100 kb+ bin to other bins are displayed. Right panel shows cartoon of the implications of boxplot results.

a) Schematic of the D. melanogaster recursive splicing mechanism. b) Log2-fold gene expression ratios following differential expression sequencing (DESeq) analysis of all human protein-coding genes between the brain and all other tissues. Data are represented as Loess smoothing curves after defining genes by their maximum length in kb. Hashed vertical line indicates 150 kb. RNA-seq data was obtained from the llumina Human Body Map 2.0 total RNA-seq library (GEO accession: GSE30611). c) Schematic of the theoretical RNA abundance across long introns demonstrating linear regression analysis performed on introns before/after novel junction consideration. d) All novel junctions identified within CADM1 by RNA-seq data are shown on top of experimentally derived RNA-seq (red) and FUS iCLIP (green) read densities, both grouped in 5 kb windows. The displayed linear regression line was determined after the intron was split at the red novel junction. This split significantly improved the regression in both RNA-seq and FUS iCLIP (p<0.01 in both, F-test). Blue novel junction contacts the RS-exon. Phylo-P sequence conservation scores are shown around the CADM1 RS-site across 46 mammalian species. e) Ratio of after:before gradients at long gene novel junctions in RNA-seq (x-axis) and FUS iCLIP (y-axis) datasets. Black and red dots represent junctions that significantly improve the regression gradient and goodness-of-fit, whereas grey dots show no improvement. Black dots are junctions contacting the sequence of 3′ splice sites, whereas red dots contact the sequence of RS-sites. Hashed lines mark upper and lower quartile ratios for each dataset. f) WebLogo of RS-sites identified by red junctions from panel (e).

a) RT-PCR validation of recursive splicing in ANK3 and CADM1 genes. b) Consensus splice site location and in-frame termination codons at RS-exons in indicated human genes. c-d) PhyloP conservation scores aligned at (c) RS-sites and (d) 5′ splice site of RS-exons. Conservation at the two nearest cryptic 5′ splice sites following RS-exons (“Nearest 5′ splice site”) and the canonical 5′ and 3′ splice sites in the same genes are also displayed. e) Schematic of the exon definition model and AON-A1 design strategy. f-g) qRT-PCR analysis of RS-site junctions in (f) human CADM1 and ANK3 genes (n=4 for NS, n=5 for AON-A1, 2 separate experiments) or (g) zebrafish cadm2a gene following treatment with AON-A1 (n=7, 3 separate experiments). h) qRT-PCR analysis of intronic RNA upstream of RS-sites (up. intron) in CADM1 and ANK3 genes following treatment with AON-A1. Location of primer pair is indicated by red arrow in schematic, and expected changes in intronic abundance indicated by grey triangles (n=4 for NS, n=5 for AON-A1, 2 separate experiments). i) qRT-PCR analysis of zebrafish cadm2a mRNA using two separate primer sets targeting constitutive exons following in vivo injection of AON-A1 (n=7, 3 separate experiments). j) qRT-PCR analysis of human CADM1 and ANK3 mRNAs following 48 hr treatment with AON-A1. mRNA for both genes was assessed in nuclear fractions (n=4 for NS, n=5 for AON-A1, 2 separate experiments). For relevant panels, * represents p<0.05 determined by two-tailed student t-test and values are mean ± S.D. Unless indicated otherwise, primers are indicated by coloured arrows within schematics. Replicate data are shown in Source Data Fig. 2.

a) Schematic of splice site competition model, the CADM2 splicing reporter P1 variants, and AON-A2 design strategy. b-c) Qiaxcel analysis of (b) indicated CADM2 splicing reporter products following transfection in SH-SY5Y cells (n=3-5, 2 separate experiments), or (c) human CADM1 and ANK3 genes following 48hrs treatment with AON-A2 (n=4 for CADM1, n=5 for ANK3, 2 separate experiments). d) Quantification of CADM1 and ANK3 RS-exon inclusion following treatment with AON-A2 then DMSO or cycloheximide in SH-SY5Y cells (n=4, 2 separate experiments). For relevant panels, * represents p<0.05 determined by two-tailed student t-test and values are mean ± S.D. Primers used are indicated by red arrows in schematics. Replicate data are shown in Source Data Fig. 3.

a) RNA-seq read density patterns in the CADM2 gene shown in 5 kb windows, with linear regression performed after the first intron is split at the two RS-sites indicated with blue vertical lines. Isoforms expressed from the dominant and minor promoters in human frontal cortex tissue are shown, and primer locations used for (b) indicated by coloured arrows. Grey forward primer is located in the first exon of dominant isoform, blue forward primer is located in the first RS-exon (1st RS-exon), red forward primer is located in the first exon of alternative isoform (P2). Zoomed area represents the sequence at the start of the second RS-exon. b-c) RT-PCR analysis of RS-exon inclusion in (b) indicated CADM2 isoforms or (c) indicated NTM isoforms (n=4 and n=3 respectively, Extended Data Fig. 7). Values are mean ± S.D. d) Schematic of CADM2 splicing reporter variants P1 and P1-m3, based on the dominant CADM2 isoform (white), and P2 and P2-m1, based on the minor CADM2 isoform (red). Splice site scores for reconstituted and RS-exon 5′ splice sites are indicated. e-f) Qiaxcel analysis of indicated CADM2 splicing reporter products following transfection in SH-SY5Y cells (n=3 or n=4, 2 separate experiments). The expected size of PCR products is shown next to each electropherogram. g) Lengths of the 9 introns containing high-confidence RS-sites compared to other vertebrate introns. h) Histogram of human gene lengths plotted alongside the percentage of genes with RS-site-containing novel junctions. i) Schematic representation of the mechanism of recursive splicing and the binary splicing switch as described in main text. For relevant panels, replicate data are shown in Source Data Fig. 4.
Comment in
-
Molecular biology: Splicing does the two-step.
Cook-Andersen H, Wilkinson MF. Cook-Andersen H, et al. Nature. 2015 May 21;521(7552):300-1. doi: 10.1038/nature14524. Epub 2015 May 13. Nature. 2015. PMID: 25970243 Free PMC article.
Similar articles
-
Molecular and genetic dissection of recursive splicing.
Joseph B, Scala C, Kondo S, Lai EC. Joseph B, et al. Life Sci Alliance. 2021 Nov 10;5(1):e202101063. doi: 10.26508/lsa.202101063. Print 2022 Jan. Life Sci Alliance. 2021. PMID: 34759052 Free PMC article.
-
Genome-wide identification of zero nucleotide recursive splicing in Drosophila.
Duff MO, Olson S, Wei X, Garrett SC, Osman A, Bolisetty M, Plocik A, Celniker SE, Graveley BR. Duff MO, et al. Nature. 2015 May 21;521(7552):376-9. doi: 10.1038/nature14475. Epub 2015 May 13. Nature. 2015. PMID: 25970244 Free PMC article.
-
Computational analysis of splicing errors and mutations in human transcripts.
Kurmangaliyev YZ, Gelfand MS. Kurmangaliyev YZ, et al. BMC Genomics. 2008 Jan 14;9:13. doi: 10.1186/1471-2164-9-13. BMC Genomics. 2008. PMID: 18194514 Free PMC article.
-
Exonization of transposed elements: A challenge and opportunity for evolution.
Schmitz J, Brosius J. Schmitz J, et al. Biochimie. 2011 Nov;93(11):1928-34. doi: 10.1016/j.biochi.2011.07.014. Epub 2011 Jul 26. Biochimie. 2011. PMID: 21787833 Review.
-
Searching for splicing motifs.
Chasin LA. Chasin LA. Adv Exp Med Biol. 2007;623:85-106. doi: 10.1007/978-0-387-77374-2_6. Adv Exp Med Biol. 2007. PMID: 18380342 Review.
Cited by
-
Background splicing as a predictor of aberrant splicing in genetic disease.
Alexieva D, Long Y, Sarkar R, Dhayan H, Bruet E, Winston RM, Vorechovsky I, Castellano L, Dibb NJ. Alexieva D, et al. RNA Biol. 2022;19(1):256-265. doi: 10.1080/15476286.2021.2024031. Epub 2021 Dec 31. RNA Biol. 2022. PMID: 35188075 Free PMC article.
-
Analysis of Pathogenic Pseudoexons Reveals Novel Mechanisms Driving Cryptic Splicing.
Keegan NP, Wilton SD, Fletcher S. Keegan NP, et al. Front Genet. 2022 Jan 24;12:806946. doi: 10.3389/fgene.2021.806946. eCollection 2021. Front Genet. 2022. PMID: 35140743 Free PMC article.
-
Regulatory sites for splicing in human basal ganglia are enriched for disease-relevant information.
Guelfi S, D'Sa K, Botía JA, Vandrovcova J, Reynolds RH, Zhang D, Trabzuni D, Collado-Torres L, Thomason A, Quijada Leyton P, Gagliano Taliun SA, Nalls MA; International Parkinson’s Disease Genomics Consortium (IPDGC); UK Brain Expression Consortium (UKBEC); Small KS, Smith C, Ramasamy A, Hardy J, Weale ME, Ryten M. Guelfi S, et al. Nat Commun. 2020 Feb 25;11(1):1041. doi: 10.1038/s41467-020-14483-x. Nat Commun. 2020. PMID: 32098967 Free PMC article.
-
Recursive splicing is a rare event in the mouse brain.
Moon S, Zhao YT. Moon S, et al. PLoS One. 2022 Jan 28;17(1):e0263082. doi: 10.1371/journal.pone.0263082. eCollection 2022. PLoS One. 2022. PMID: 35089962 Free PMC article.
-
Scekic-Zahirovic J, Sendscheid O, El Oussini H, Jambeau M, Sun Y, Mersmann S, Wagner M, Dieterlé S, Sinniger J, Dirrig-Grosch S, Drenner K, Birling MC, Qiu J, Zhou Y, Li H, Fu XD, Rouaux C, Shelkovnikova T, Witting A, Ludolph AC, Kiefer F, Storkebaum E, Lagier-Tourenne C, Dupuis L. Scekic-Zahirovic J, et al. EMBO J. 2016 May 17;35(10):1077-97. doi: 10.15252/embj.201592559. Epub 2016 Mar 7. EMBO J. 2016. PMID: 26951610 Free PMC article.
References
-
- Hatton AR, Subramaniam V, Lopez AJ. Generation of alternative Ultrabithorax isoforms and stepwise removal of a large intron by resplicing at exon-exon junctions. Mol Cell. 1998;2:787–796. doi:S1097-2765(00)80293-2 [pii] - PubMed
-
- Thakurela S, et al. Gene regulation and priming by topoisomerase IIalpha in embryonic stem cells. Nat Commun. 2013;4:2478. doi:ncomms3478 [pii]10.1038/ncomms3478. - PubMed
Additional References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous