pubmed.ncbi.nlm.nih.gov

Genome-wide assembly and analysis of alternative transcripts in mouse - PubMed

Comparative Study

Genome-wide assembly and analysis of alternative transcripts in mouse

Alexei A Sharov et al. Genome Res. 2005 May.

Abstract

To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays. The sequence data from this study have been submitted to GenBank under accession numbers: CK329321-CK334090; CF891695-CF906652; CF906741-CF916750; CK334091-CK347104; CK387035-CK393993; CN660032-CN690720; CN690721-CN725493.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.

Frequency distribution of protein-coding genes by the number of exons (A) and transcripts (B), and the relation between the number of exons and average number of transcripts per gene (C). Regression lines are as follows: (A) log(N) = 3.44–0.0485x; (B) log(N) = 3.72–1.32[log(x)]2; (C) N = 9.44[1–exp(–0.0574x)].

Figure 2.
Figure 2.

Comparison of the NIA Mouse Gene Index with other indexes (UniGene, TIGR, DoTS, and ESTGenes). (A) Gene coverage; (B) transcript coverage; (C) genes missing in other gene indexes; (D) genes missing in the NIA Mouse Gene Index.

Figure 3.
Figure 3.

Coding system for alternative transcription/splicing (ATS).

Figure 4.
Figure 4.

Combinatorial classification of ATS units.

Similar articles

Cited by

References

    1. Beaudoing, E. and Gautheret, D. 2001. Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data. Genome Res. 11: 1520–1526. - PMC - PubMed
    1. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. 2004. GenBank: Update. Nucleic Acids Res. 32: D23–D26. - PMC - PubMed
    1. Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., Clarke, L., Coates, G., Cox, T., Cuff, J., et al. 2004. Ensembl 2004. Nucleic Acids Res. 32: D468–D470. - PMC - PubMed
    1. Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—Database for “expressed sequence tags.” Nat. Genet. 4: 332–333. - PubMed
    1. Burset, M., Seledtsov, I.A., and Solovyev, V.V. 2000. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28: 4364–4375. - PMC - PubMed

Publication types

MeSH terms