pubmed.ncbi.nlm.nih.gov

Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting - PubMed

Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting

Claudia Fritsch et al. Genome Res. 2012 Nov.

Abstract

So far, the annotation of translation initiation sites (TISs) has been based mostly upon bioinformatics rather than experimental evidence. We adapted ribosomal footprinting to puromycin-treated cells to generate a transcriptome-wide map of TISs in a human monocytic cell line. A neural network was trained on the ribosomal footprints observed at previously annotated AUG translation initiation codons (TICs), and used for the ab initio prediction of TISs in 5062 transcripts with sufficient sequence coverage. Functional interpretation suggested 2994 novel upstream open reading frames (uORFs) in the 5' UTR, 1406 uORFs overlapping with the coding sequence, and 546 N-terminal protein extensions. The TIS detection method was validated on the basis of previously published alternative TISs and uORFs. Among primates, TICs in newly annotated TISs were significantly more conserved than control codons, both for AUGs and near-cognate codons. The transcriptome-wide map of novel candidate TISs derived as part of the study will shed further light on the way in which human proteome diversity is influenced by alternative translation initiation and regulation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.

Enrichment of THP-1 cell ribosomal footprint data for TISs, following puromycin treatment. (A) Polysome profile of the TPP1 gene in control and puromycin-treated THP-1 cells. (B) Pooled read coverage for the 500 most highly expressed genes. Transcript-specific coverage values were normalized to the total number of reads for each gene and the transcript length was scaled to 1000 bp for all RefSeq sequences. (C) Pooled read coverage around the annotated AUG TICs of the 500 most highly expressed genes in puromycin-treated cells.

Figure 2.
Figure 2.

Algorithm used for the functional classification of neural network-predicted TISs as either “annotated TIS,” “N-terminal protein extension,” “upstream ORF” (uORF),” or “CDS-overlapping uORF.” The respective AUG or non-cognate codon was searched for in a ±3-bp window around the merged positive TIS signal emitted by the neural network. For each transcript, the depicted algorithm is applied until no further network-predicted TISs are available for classification.

Figure 3.
Figure 3.

Codon usage and functional classification of neural network-defined TISs: (A) Distribution of the number of putative TISs per transcript. (B) Functional classification of putative TISs. (C) TIC usage in putative TISs, either including AUG (upper row) or for near-cognate codons only (bottom row). (D) Average codon frequency over all three reading frames in the analyzed 5′ UTRs, considering either all possible codons (left), the 10 TIS-relevant codons identified in our study only (middle), or near-cognate codons only (right).

Figure 4.
Figure 4.

Gene-based examples for the annotation of TIS for the AMD1 (A) and TOP2A (B) genes in the presented data set: Screenshots from the online resource (the annotation tracks at

http://gengastro.1med.uni-kiel.de/suppl/footprint/

) are provided. The network-identified TISs are marked in gray and are numbered consecutively along the genome assembly for each RefSeq sequence. The classification results of TIS according to the algorithm in Figure 2 are noted in red. In addition to the markings provided in the online resource, the open reading frames are highlighted with red arrows for uORFs, a blue arrow for the N-terminal protein extension of TOP2A, and green arrows for the annotated CDS of the two genes. (A) Previously known uORF at network-predicted TIS NM_001634:1. An additional internal network-predicted TIS is present in this uORF and was thus not annotated independently as noted in the results. (B) A novel uORF at network-predicted TIS NM_001067:3 and a novel N-terminal protein extension at network-predicted TIS NM_001067:2 are shown. The annotated AUG TISs are detected in both genes.

Figure 5.
Figure 5.

Primate conservation analysis of TICs at neural network-predicted TISs. For each functional category and codon type, the difference in mean Conservation Score in nine primate species (with 95% confidence interval) is depicted between case and control TICs. For comparison, the difference in mean Conservation Score for the annotated AUG TICs (open box) is also included for each category. Numbers below boxes refer to the number of predicted TIS falling into the respective TIS by TIC category. TICs showing statistically significant conservation after Bonferoni correction (31 tests, P < 0.0016) are marked by an asterisk. Further details of the primate conservation analysis are provided in Supplemental Table 3.

Similar articles

Cited by

References

    1. Allen DW, Zamecnik PC 1962. The effect of puromycin on rabbit reticulocyte ribosomes. Biochim Biophys Acta 55: 865–874 - PubMed
    1. Bazykin GA, Kochetov AV 2011. Alternative translation start sites are conserved in eukaryotic genomes. Nucleic Acids Res 39: 567–577 - PMC - PubMed
    1. Brar GA, Yassour M, Friedman N, Regev A, Ingolia NT, Weissman JS 2012. High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335: 552–557 - PMC - PubMed
    1. Brown CY, Mize GJ, Pineda M, George DL, Morris DR 1999. Role of two upstream open reading frames in the translational control of oncogene mdm2. Oncogene 18: 5631–5637 - PubMed
    1. Bruening W, Pelletier J 1996. A non-AUG translational initiation event generates novel WT1 isoforms. J Biol Chem 271: 8646–8654 - PubMed

Publication types

MeSH terms

Substances