Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells - PubMed
- ️Tue Jan 01 2013
Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells
Alla A Sigova et al. Proc Natl Acad Sci U S A. 2013.
Abstract
Many long noncoding RNA (lncRNA) species have been identified in mammalian cells, but the genomic origin and regulation of these molecules in individual cell types is poorly understood. We have generated catalogs of lncRNA species expressed in human and murine embryonic stem cells and mapped their genomic origin. A surprisingly large fraction of these transcripts (>60%) originate from divergent transcription at promoters of active protein-coding genes. The divergently transcribed lncRNA/mRNA gene pairs exhibit coordinated changes in transcription when embryonic stem cells are differentiated into endoderm. Our results reveal that transcription of most lncRNA genes is coordinated with transcription of protein-coding genes.
Conflict of interest statement
The authors declare no conflict of interest.
Figures

Most lncRNAs are associated with active protein-coding genes in hESCs. (A) Schematic diagram of pipeline for identification of lncRNAs in hESCs. An “initial RNA pool” was compiled from transcripts assembled de novo from RNA-seq reads (this study;
SI Materials and Methods) and published data (20). Four criteria required for the selection of expressed transcripts from this pool are indicated in red. Transcripts were required to be expressed from a high-confidence start site (occupied by H3K4me3), to be noncoding [lacking features of protein-coding RNAs as defined by the CPC (28)], to be long (>100 nt), and to be nonrepetitive. (B) Summary of various types and numbers of lncRNA loci in hESCs, which are listed in
Dataset S1. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in nucleosomes with histone 3 acetylated at lysine 27 (H3K27Ac). Enriched regions for H3K27Ac are available in
Dataset S2. (C) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3-modified nucleosomes (48) together with reads for polyadenylated RNA in the vicinity of CAPN10. Transcription at lncRNA locus 959 generates three alternatively spliced lncRNA transcripts that are divergent from CAPN10. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. The scale is indicated in the upper right. (D) Distribution of TSS of lncRNAs relative to the TSS of protein-coding genes. Coding regions are normalized to equal length, and the regions upstream of associated promoters are divided into one hundred 100-bp bins. Distance between TSS of protein-coding gene and 5′ end of lncRNA is indicated on x axis and expressed in kilobases (kb). Antisense lncRNA loci are indicated in red. Sense lncRNA loci are indicated in blue.

lncRNAs are derived from divergent transcription of active protein-coding genes in hESCs. (A) Alignment of GRO-seq reads for the 2,318 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kilobases. The y axis indicates the average number of uniquely mapped GRO-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (B) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3-modified nucleosomes (48) together with GRO-seq reads and reads for polyadenylated RNA in the vicinity of STAM. Transcription at lncRNA locus 3182 generates two alternatively spliced lncRNA transcripts that are divergent from STAM. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq, GRO-seq, and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads that map to Watson (purple) and Crick (magenta) strands of genomic DNA are shown separately. The scale is indicated in the upper right. (C) Alignment of RNA-seq reads for the 2,318 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kb. The y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph.

Most lncRNAs are divergently transcribed from protein-coding genes in mESCs. (A) Summary of various types and numbers of lncRNA loci in the mESC catalog, which are listed in
Dataset S1. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in H3K27Ac (49). Enriched regions for H3K27Ac are available in
Dataset S2. (B) Example of lncRNA locus whose 5′ end occurs within 2 kb of the TSS of a protein-coding gene (promoter-associated lncRNA). Gene tracks represent ChIP-seq data for H3K4me3 modified nucleosomes (this study), together with GRO-seq reads and reads for polyadenylated RNA in the vicinity of Nol10. Transcription at lncRNA locus 1160 generates lncRNA transcripts that are divergent from Nol10. The x axis represents the linear sequence of genomic DNA, and the y axis represents the total number of ChIP-seq, GRO-seq, and RNA-seq mapped reads. RNA-seq reads that map to Watson (blue) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads that map to Watson (purple) and Crick (magenta) strands of genomic DNA are shown separately. The scale is indicated above the track. (C) Alignment of GRO-seq reads for the 1,030 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kb. The y axis represents average number of uniquely mapped GRO-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (D) Alignment of RNA-seq reads for the 1,030 protein-coding genes that contain lncRNAs within 2 kb of their TSS. Reads are aligned in 250-bp bins. The x axis indicates the distance from the TSS in kilobases. The y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph.

Divergent lncRNA/mRNA pairs exhibit coordinated changes in transcription as ESCs differentiate into endoderm. (A) Summary of the genomic distribution of lncRNA loci 48 h after induction of endodermal differentiation in hESCs. Diagrams at right depict lncRNA loci as red lines, protein-coding genes as blue lines, and an enhancer as an open box. An arrow indicates direction of transcription initiation. Enhancer-associated lncRNAs overlap or originate at genomic regions enriched in H3K27Ac. Enriched regions for H3K27Ac are available in
Dataset S2. (B) Alignment of GRO-seq reads 48 h after induction of endodermal differentiation in hESCs for the 2,680 protein-coding genes that contain lncRNAs within 2 kb of their TSS. The x axis indicates the distance from the TSS in kilobases, and the y axis indicates the average number of uniquely mapped GRO-seq reads normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on the graph. (C) Alignment of RNA-seq reads 48 h after induction of endodermal differentiation in hESCs for the 2,680 protein-coding genes that contain lncRNAs within 2 kb of their TSS. The x axis indicates the distance from the TSS in kilobases, and the y axis indicates the average number of uniquely mapped RNA-seq counts normalized to reads per genomic bin per million uniquely mapped reads. Reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately as indicated on graph. (D) Example of lncRNA/mRNA pairs exhibiting coordinated transcriptional induction 48 h after hESCs were differentiated toward the endoderm. Gene tracks represent GRO-seq data in the vicinity of GATA6. Divergent transcription generates antisense lncRNA locus 5689 upstream of GATA6. The x axis represents the linear sequence of genomic DNA, and the y axis represents the number of GRO-seq reads normalized to total number of mapped reads. GRO-seq reads that map to Watson (black) and Crick (red) strands of genomic DNA are shown separately. GRO-seq reads mapped to the Crick (red) strand of genomic DNA are shown flipped/rotated beneath. The scale is indicated in kilobases (kb) above the track. (E) Example of lncRNA/mRNA pairs exhibiting coordinated transcriptional induction 48 h after hESCs were differentiated toward the endoderm. Gene tracks represent GRO-seq data in the vicinity of LHX5. Divergent transcription generates antisense lncRNA locus 4010 upstream of LHX5. The x axis represents the linear sequence of genomic DNA, and the y axis represents the number of GRO-seq reads normalized to total number of mapped reads. (F) Coordinate transcriptional induction of lncRNA/mRNA gene pairs. A total of 683 lncRNA/mRNA pairs were selected, in which the numbers of GRO-seq reads of mRNA increased at least 1.25-fold after 48 h of endodermal differentiation. The average number of uniquely mapped GRO-seq reads from the strands encoding the mRNA transcripts is shown in black (Upper). The average number of uniquely mapped GRO-seq reads from the strands encoding the lncRNA transcripts is shown in red (Lower). Solid lines represent transcription in hESCs, and dashed lines represent transcription 48 h after induction of differentiation toward the endoderm. The x axis indicates the linear distance in kilobases, and the y axis indicates the average reads per genomic bin per million uniquely mapped reads.
Similar articles
-
Sheik Mohamed J, Gaughwin PM, Lim B, Robson P, Lipovich L. Sheik Mohamed J, et al. RNA. 2010 Feb;16(2):324-37. doi: 10.1261/rna.1441510. Epub 2009 Dec 21. RNA. 2010. PMID: 20026622 Free PMC article.
-
Divergent transcription is associated with promoters of transcriptional regulators.
Lepoivre C, Belhocine M, Bergon A, Griffon A, Yammine M, Vanhille L, Zacarias-Cabeza J, Garibal MA, Koch F, Maqbool MA, Fenouil R, Loriod B, Holota H, Gut M, Gut I, Imbert J, Andrau JC, Puthier D, Spicuglia S. Lepoivre C, et al. BMC Genomics. 2013 Dec 23;14:914. doi: 10.1186/1471-2164-14-914. BMC Genomics. 2013. PMID: 24365181 Free PMC article.
-
Korneev SA, Korneeva EI, Lagarkova MA, Kiselev SL, Critchley G, O'Shea M. Korneev SA, et al. RNA. 2008 Oct;14(10):2030-7. doi: 10.1261/rna.1084308. RNA. 2008. PMID: 18820242 Free PMC article.
-
The Importance of Controlling Transcription Elongation at Coding and Noncoding RNA Loci.
Scruggs BS, Adelman K. Scruggs BS, et al. Cold Spring Harb Symp Quant Biol. 2015;80:33-44. doi: 10.1101/sqb.2015.80.027235. Cold Spring Harb Symp Quant Biol. 2015. PMID: 27325707 Free PMC article. Review.
-
Zhu J, Wang Y, Yu W, Xia K, Huang Y, Wang J, Liu B, Tao H, Liang C, Li F. Zhu J, et al. Curr Stem Cell Res Ther. 2019;14(3):259-267. doi: 10.2174/1574888X14666181127145809. Curr Stem Cell Res Ther. 2019. PMID: 30479219 Review.
Cited by
-
The Evx1/Evx1as gene locus regulates anterior-posterior patterning during gastrulation.
Bell CC, Amaral PP, Kalsbeek A, Magor GW, Gillinder KR, Tangermann P, di Lisio L, Cheetham SW, Gruhl F, Frith J, Tallack MR, Ru KL, Crawford J, Mattick JS, Dinger ME, Perkins AC. Bell CC, et al. Sci Rep. 2016 May 26;6:26657. doi: 10.1038/srep26657. Sci Rep. 2016. PMID: 27226347 Free PMC article.
-
Timing without coding: How do long non-coding RNAs regulate circadian rhythms?
Mosig RA, Kojima S. Mosig RA, et al. Semin Cell Dev Biol. 2022 Jun;126:79-86. doi: 10.1016/j.semcdb.2021.04.020. Epub 2021 Jun 9. Semin Cell Dev Biol. 2022. PMID: 34116930 Free PMC article. Review.
-
Genomic Landscapes of Noncoding RNAs Regulating VEGFA and VEGFC Expression in Endothelial Cells.
Mushimiyimana I, Tomas Bosch V, Niskanen H, Downes NL, Moreau PR, Hartigan K, Ylä-Herttuala S, Laham-Karam N, Kaikkonen MU. Mushimiyimana I, et al. Mol Cell Biol. 2021 Jun 23;41(7):e0059420. doi: 10.1128/MCB.00594-20. Epub 2021 Jun 23. Mol Cell Biol. 2021. PMID: 33875575 Free PMC article.
-
Promoter directionality is controlled by U1 snRNP and polyadenylation signals.
Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. Almada AE, et al. Nature. 2013 Jul 18;499(7458):360-3. doi: 10.1038/nature12349. Epub 2013 Jun 23. Nature. 2013. PMID: 23792564 Free PMC article.
-
Long non-coding RNAs in innate and adaptive immunity.
Aune TM, Spurlock CF 3rd. Aune TM, et al. Virus Res. 2016 Jan 2;212:146-60. doi: 10.1016/j.virusres.2015.07.003. Epub 2015 Jul 9. Virus Res. 2016. PMID: 26166759 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases