Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants - PubMed
- ️Mon Oct 26 2229
Analysis of the cDNAs of hypothetical genes on Arabidopsis chromosome 2 reveals numerous transcript variants
Yong-Li Xiao et al. Plant Physiol. 2005 Nov.
Abstract
In the fully sequenced Arabidopsis (Arabidopsis thaliana) genome, many gene models are annotated as "hypothetical protein," whose gene structures are predicted solely by computer algorithms with no support from either expressed sequence matches from Arabidopsis, or nucleic acid or protein homologs from other species. In order to confirm their existence and predicted gene structures, a high-throughput method of rapid amplification of cDNA ends (RACE) was used to obtain their cDNA sequences from 11 cDNA populations. Primers from all of the 797 hypothetical genes on chromosome 2 were designed, and, through 5' and 3' RACE, clones from 506 genes were sequenced and cDNA sequences from 399 target genes were recovered. The cDNA sequences were obtained by assembling their 5' and 3' RACE polymerase chain reaction products. These sequences revealed that (1) the structures of 151 hypothetical genes were different from their predictions; (2) 116 hypothetical genes had alternatively spliced transcripts and 187 genes displayed polyadenylation sites; and (3) there were transcripts arising from both strands, from the strand opposite to that of the prediction and possible dicistronic transcripts. Promoters from five randomly chosen hypothetical genes (At2g02540, At2g31270, At2g33640, At2g35550, and At2g36340) were cloned into report constructs, and their expressions are tissue or development stage specific. Our results indicate at least 50% of hypothetical genes on chromosome 2 are expressed in the cDNA populations with about 38% of the gene structures differing from their predictions. Thus, by using this targeted approach, high-throughput RACE, we revealed numerous transcripts including many uncharacterized variants from these hypothetical genes.
Figures

Flow scheme and outcome of the RACE process. The figure shows the numbers of genes for which primers were designed, the success rate for cloning and sequencing, and the outcome in terms of numbers of assemblies produced and their relationship to the original gene predictions.

Examples of comparison of experimentally derived cDNA sequences to the predictions. A to H, Different types of relationships between the predicted gene structure (identified by AGI identifier: At2g#####) and that inferred from RACE sequence (Assembly). The continuous upper line indicates the uninterrupted genomic sequence, the second line the spliced alignment of the predicted hypothetical gene model to the genomic sequence, and subsequent lines the spliced alignments of RACE-derived sequence. Vertical dotted lines highlight differences in splice site locations. All the predicted gene structures are annotation Version 1. The GenBank accession numbers are At2g02320 assembly, AY168990; At2g33400 assembly, AY144106; At2g10850 assembly, AY501348; At2g01240 assembly, AY500325; At2g03020 assembly, AY219083; At2g06800 assembly, AY464642; At2g01960 assembly, AY168991; and At2g04850 assembly, DQ069802.

Example of an alternatively spliced transcript. Notation for spliced alignments as for Figure 2. Vertical dotted lines indicate the starts and stops of CDS. Horizontal arrowed, dotted lines indicate the CDS direction of transcription and the sizes of the CDS. The predicted gene structures are annotation Version 1. The GenBank accession numbers are assembly1, AY231412; assembly2, AY231413; and assembly3, AY231414.

Examples of transcripts from strands opposite to the original prediction. Solid lines are genomic sequences in genomic strands or exon sequences in prediction and assemblies. Dotted lines show the alignment of intron/exon boundaries. Splice site, GT-AG or CT-AC, is shown in genomic strands. Poly(T) tails are shown as TTTTT in assemblies. Start and stop codons are shown in predicted CDS. The predicted gene structures are annotation Version 1. None of the opposite-strand transcripts shown here has a good ORF. The GenBank accession numbers are transcript on the opposite strand of At2g39975 (assembly1), DQ069850; transcript on the opposite strand of At2g24480 (assembly2), DQ069848; and transcript on the opposite strand of At2g24480 (assembly3), DQ069849.

Three examples of transcripts recovered from both strands. Notation as for Figure 4. Poly(A) or poly(T) tails are show as AAAAA or TTTTT in assemblies. The predicted gene structures are annotation Version 1. The GenBank accession numbers are At2g32890 assembly2, DQ069836; At2g32890 assembly4 (opposite-strand transcript), DQ069843; At2g03460 assembly1, AY501354; At2g03460 assembly2 (opposite-strand transcript), DQ069851; At2g27160 assembly1, DQ069824; and At2g27160 assembly4 (opposite-strand transcript), DQ069842.

Four genomic regions that give rise to dicistronic transcripts. Notation as for Figure 4. The start and stop codons and lengths of each ORF are shown. The stop codon of ORF1 in the region of At2g14350 (D) lies within the third intron of the originally predicted hypothetical gene. The predicted gene structures are annotation Version 1. The GenBank accession numbers are At2g18200 assembly, DQ069798; At2g07708 assembly, DQ069799; At2g10070 assembly, DQ069800; and At2g14350 assembly, AY227649.

Examples of gene models that require merging. Notation as for Figure 4. In each case, a single transcript encompasses both predicted ORFs and can be translated as a single protein. The GenBank accession numbers are At2g40316 assembly1, DQ069793; At2g40316 assembly2, AY231418; At2g40316 assembly3, DQ069794; At2g27650 assembly1, DQ069795; At2g27650 assembly2, DQ069796; and At2g47720 assembly, DQ069797.

Expression patterns of tested hypothetical genes A, Constructs developed for testing the expression of hypothetical genes. RB and LB, T-DNA right and left borders; NPTII, kanamycin resistance gene; HG promoter, hypothetical gene promoter; NOS-ter, nos terminator; mGAL-VP16, GAL4-VP16 gene with modified codon usage; mGFP-ER, modified GFP with increased fluorescent properties and targeted to the endoplasmic reticulum. B, At2g02540 is expressed in vascular tissue. C, At2g31270 is expressed in carpels and petals. D, At2g33640 is expressed in pollen. E, At2g35550 is expressed in carpels, stamens, young anthers, and petals. F, At2g36340 is expressed in young anthers and some young seeds. Arrows indicate absent expression in three flowers.
Similar articles
-
Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis.
Xiao YL, Malik M, Whitelaw CA, Town CD. Xiao YL, et al. Plant Physiol. 2002 Dec;130(4):2118-28. doi: 10.1104/pp.010207. Plant Physiol. 2002. PMID: 12481096 Free PMC article.
-
Taji T, Komatsu K, Katori T, Kawasaki Y, Sakata Y, Tanaka S, Kobayashi M, Toyoda A, Seki M, Shinozaki K. Taji T, et al. BMC Plant Biol. 2010 Nov 24;10:261. doi: 10.1186/1471-2229-10-261. BMC Plant Biol. 2010. PMID: 21106055 Free PMC article.
-
Features of Arabidopsis genes and genome discovered using full-length cDNAs.
Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB, Feldmann KA. Alexandrov NN, et al. Plant Mol Biol. 2006 Jan;60(1):69-85. doi: 10.1007/s11103-005-2564-9. Plant Mol Biol. 2006. PMID: 16463100
-
Kato A, Suzuki M, Kuwahara A, Ooe H, Higano-Inaba K, Komeda Y. Kato A, et al. Gene. 1999 Nov 1;239(2):309-16. doi: 10.1016/s0378-1119(99)00403-5. Gene. 1999. PMID: 10548732
-
Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes.
Gong W, Shen YP, Ma LG, Pan Y, Du YL, Wang DH, Yang JY, Hu LD, Liu XF, Dong CX, Ma L, Chen YH, Yang XY, Gao Y, Zhu D, Tan X, Mu JY, Zhang DB, Liu YL, Dinesh-Kumar SP, Li Y, Wang XP, Gu HY, Qu LJ, Bai SN, Lu YT, Li JY, Zhao JD, Zuo J, Huang H, Deng XW, Zhu YX. Gong W, et al. Plant Physiol. 2004 Jun;135(2):773-82. doi: 10.1104/pp.104.042176. Plant Physiol. 2004. PMID: 15208423 Free PMC article.
Cited by
-
Libault M, Joshi T, Takahashi K, Hurley-Sommer A, Puricelli K, Blake S, Finger RE, Taylor CG, Xu D, Nguyen HT, Stacey G. Libault M, et al. Plant Physiol. 2009 Nov;151(3):1207-20. doi: 10.1104/pp.109.144030. Epub 2009 Sep 15. Plant Physiol. 2009. PMID: 19755542 Free PMC article.
-
A soybean cyst nematode resistance gene points to a new mechanism of plant resistance to pathogens.
Liu S, Kandoth PK, Warren SD, Yeckel G, Heinz R, Alden J, Yang C, Jamai A, El-Mellouki T, Juvale PS, Hill J, Baum TJ, Cianzio S, Whitham SA, Korkin D, Mitchum MG, Meksem K. Liu S, et al. Nature. 2012 Dec 13;492(7428):256-60. doi: 10.1038/nature11651. Epub 2012 Oct 15. Nature. 2012. PMID: 23235880
-
Richardson DN, Rogers MF, Labadorf A, Ben-Hur A, Guo H, Paterson AH, Reddy AS. Richardson DN, et al. PLoS One. 2011;6(9):e24542. doi: 10.1371/journal.pone.0024542. Epub 2011 Sep 14. PLoS One. 2011. PMID: 21935421 Free PMC article.
-
Guerriero G, Spadiut O, Kerschbamer C, Giorno F, Baric S, Ezcurra I. Guerriero G, et al. J Exp Bot. 2012 Oct;63(16):6045-56. doi: 10.1093/jxb/ers255. J Exp Bot. 2012. PMID: 23048131 Free PMC article.
-
Xiao YL, Redman JC, Monaghan EL, Zhuang J, Underwood BA, Moskal WA, Wang W, Wu HC, Town CD. Xiao YL, et al. Plant Methods. 2010 Aug 6;6:18. doi: 10.1186/1746-4811-6-18. Plant Methods. 2010. PMID: 20687964 Free PMC article.
References
-
- Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
-
- Asakura N, Nakamura C, Ishii T, Kasai Y, Yoshida S (2002) A transcriptionally active maize MuDR-like transposable element in rice and its relatives. Mol Genet Genomics 268: 321–330 - PubMed
-
- Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymer. In R Altman, D Brutlag, P Karp, R Lathrop, D Searls, eds, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB-94). AAAI Press, Stanford, CA, pp 28–36A - PubMed
-
- Belsham GJ, Sonenberg N (2000) Picornavirus RNA translation: roles for cellular proteins. Trends Microbiol 8: 330–335 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases