Genome-wide assembly and analysis of alternative transcripts in mouse - PubMed
Comparative Study
Genome-wide assembly and analysis of alternative transcripts in mouse
Alexei A Sharov et al. Genome Res. 2005 May.
Abstract
To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays. The sequence data from this study have been submitted to GenBank under accession numbers: CK329321-CK334090; CF891695-CF906652; CF906741-CF916750; CK334091-CK347104; CK387035-CK393993; CN660032-CN690720; CN690721-CN725493.
Figures

Frequency distribution of protein-coding genes by the number of exons (A) and transcripts (B), and the relation between the number of exons and average number of transcripts per gene (C). Regression lines are as follows: (A) log(N) = 3.44–0.0485x; (B) log(N) = 3.72–1.32[log(x)]2; (C) N = 9.44[1–exp(–0.0574x)].

Comparison of the NIA Mouse Gene Index with other indexes (UniGene, TIGR, DoTS, and ESTGenes). (A) Gene coverage; (B) transcript coverage; (C) genes missing in other gene indexes; (D) genes missing in the NIA Mouse Gene Index.

Coding system for alternative transcription/splicing (ATS).

Combinatorial classification of ATS units.
Similar articles
-
Gu L, Guo R. Gu L, et al. J Genet Genomics. 2007 Mar;34(3):247-57. doi: 10.1016/S1673-8527(07)60026-5. J Genet Genomics. 2007. PMID: 17498622
-
Gardina PJ, Clark TA, Shimada B, Staples MK, Yang Q, Veitch J, Schweitzer A, Awad T, Sugnet C, Dee S, Davies C, Williams A, Turpaz Y. Gardina PJ, et al. BMC Genomics. 2006 Dec 27;7:325. doi: 10.1186/1471-2164-7-325. BMC Genomics. 2006. PMID: 17192196 Free PMC article.
-
Characterization of 954 bovine full-CDS cDNA sequences.
Harhay GP, Sonstegard TS, Keele JW, Heaton MP, Clawson ML, Snelling WM, Wiedmann RT, Van Tassell CP, Smith TP. Harhay GP, et al. BMC Genomics. 2005 Nov 23;6:166. doi: 10.1186/1471-2164-6-166. BMC Genomics. 2005. PMID: 16305752 Free PMC article.
-
Bioinformatics detection of alternative splicing.
Kim N, Lee C. Kim N, et al. Methods Mol Biol. 2008;452:179-97. doi: 10.1007/978-1-60327-159-2_9. Methods Mol Biol. 2008. PMID: 18566765 Review.
-
Computational methods for alternative splicing prediction.
Bonizzoni P, Rizzi R, Pesole G. Bonizzoni P, et al. Brief Funct Genomic Proteomic. 2006 Mar;5(1):46-51. doi: 10.1093/bfgp/ell011. Epub 2006 Feb 20. Brief Funct Genomic Proteomic. 2006. PMID: 16769678 Review.
Cited by
-
Hung SS, Wong RC, Sharov AA, Nakatake Y, Yu H, Ko MS. Hung SS, et al. DNA Res. 2013 Aug;20(4):391-402. doi: 10.1093/dnares/dst018. Epub 2013 May 5. DNA Res. 2013. PMID: 23649898 Free PMC article.
-
Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells.
Sigova AA, Mullen AC, Molinie B, Gupta S, Orlando DA, Guenther MG, Almada AE, Lin C, Sharp PA, Giallourakis CC, Young RA. Sigova AA, et al. Proc Natl Acad Sci U S A. 2013 Feb 19;110(8):2876-81. doi: 10.1073/pnas.1221904110. Epub 2013 Feb 4. Proc Natl Acad Sci U S A. 2013. PMID: 23382218 Free PMC article.
-
Gene expression profiling of mouse embryos with microarrays.
Sharov AA, Piao Y, Ko MS. Sharov AA, et al. Methods Enzymol. 2010;477:511-41. doi: 10.1016/S0076-6879(10)77025-7. Methods Enzymol. 2010. PMID: 20699157 Free PMC article.
-
Deciphering the Developmental Dynamics of the Mouse Liver Transcriptome.
Gunewardena SS, Yoo B, Peng L, Lu H, Zhong X, Klaassen CD, Cui JY. Gunewardena SS, et al. PLoS One. 2015 Oct 23;10(10):e0141220. doi: 10.1371/journal.pone.0141220. eCollection 2015. PLoS One. 2015. PMID: 26496202 Free PMC article.
-
Sharova LV, Sharov AA, Piao Y, Shaik N, Sullivan T, Stewart CL, Hogan BL, Ko MS. Sharova LV, et al. Dev Biol. 2007 Jul 15;307(2):446-59. doi: 10.1016/j.ydbio.2007.05.004. Epub 2007 May 10. Dev Biol. 2007. PMID: 17560561 Free PMC article.
References
-
- Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—Database for “expressed sequence tags.” Nat. Genet. 4: 332–333. - PubMed
Web site references
-
- http://lgsun.grc.nia.nih.gov/geneindex4/; NIA Mouser Gene Index.
-
- http://www.sanger.ac.uk/Software/formats/GFF/; GFF format.
-
- http://lgsun.grc.nia.nih.gov/geneindex4/download.html; All data and software.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases