N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana - PubMed
N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana
Patrick Willems et al. Mol Cell Proteomics. 2017 Jun.
Abstract
Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes.
© 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Figures

Number of unique peptide sequences identified for (A) trypsin, (B) chymotrypsin, (C) endoproteinase GluC, and (D) endoproteinase AspN digested proteome samples, plotted against different FDR score thresholds (x axis). The combined (red dots) and the individual search engine results are shown (see legend).

Generation of customized Nt-peptide libraries. A six-frame translation of the TAIR10 genome was performed by the EMBOSS getorf program (42) or subjected to ab initio gene prediction by Augustus 2.5.5 (44, 45). Resulting protein sequences were in silico digested by any of the four proteases used. Only Nt-peptides starting at position 1 or 2, considering the NME rule, were retained. Peptides matching the TAIR10 proteome or the cRAP database (common Repository of Adventitious Proteins,
http://www.thegpm.org/crap/) were omitted. The resulting number of non-redundant target peptide sequences are shown for every protease.

Peptide identifications pointing to novel TIS. Of all identified novel peptides, solely Nt-modified peptides, i.e. in vitro or in vivo Nt-acetylated and NME compliant N termini were considered, resulting in a total of 122 novel Nt-peptides (169 PSMs;
supplemental Data Set S1). As additional support, MS2PIP Pearson correlations were computed, where high correlation indicates a correlation higher than the median correlation observed for spectra matching TAIR10 database annotated N termini (
supplemental Fig. S6).

Positioning of new Nt-peptides in relation to TAIR10 protein-coding gene models (A), pseudogenes (B) or transposable elements (C). Intergenic Nt-peptides, i.e. not overlapping with a TAIR10 transcript, are shown in green, whereas intragenic Nt-peptides are shown in red. The number of identified peptides as well as the N terminus of a peptide (‘N′) are indicated, except for intergenic Nt-peptides.

Novel Nt-peptides matching Augustus predicted gene models and Araport11 annotations. TAIR10, Augustus predicted and Araport11 annotated gene models in addition to the Nt-peptide sequences identified were loaded as tracks in the Integrative Genome Viewer (IGV; 55). Nt-peptide identifications are shown that hint to the expression of Nt-protein extensions (A–C), originate from translation initiation at a novel upstream exon (B) an exon extension (C) and a newly identified exon-exon splicing event (D).

Ribo-seq supported Augustus gene models with matching Nt-peptides. Cycloheximide (CHX) and lactimidomycin (LTM) coverage are displayed according the mapped strand (green: forward strand, blue: reverse strand). TAIR10, Araport11 gene models, and the identified peptides were loaded as tracks in IGV (55). A, Translation evidence of an annotated transposable element. B, Augustus predicted protein extension indicative of a novel splice site (green rectangle). C, Alternative translation giving rise to an Nt-extension of SDF2. The cleavage site (CS) prediction by TargetP (71, 72) is indicated by a dotted line.

Sequence conservation of start codon and novel Nt-peptide mapping an upstream exon of the gene encoding 2OG and Fe(II)-dependent oxygenase (AT4G16765). Cycloheximide (CHX) and lactimidomycin (LTM) coverage are displayed for the reverse strand (blue). TAIR10, Araport11 gene models, and the identified peptides are loaded as tracks in IGV (55). The genomic sequence alignment (A. thaliana versus A. lyrata, and B. rapa, EnsemblPlants release 33) of the first exon is displayed, the start codons were printed in bold. Translation of aligned sequences to amino acids of which the iMet are indicated in bold. Non-identical nucleotides or amino acids were displayed in red.

TIS context sequence frequency and scoring. Nucleotide frequency plots of the TIS sequence context (positions −5 to +4) of (A) all TAIR10 transcripts (28,784 nonredundant TIS), B, the novel Nt-peptides (111 TIS) and (C) Nt-peptides conserved in Brassicaceae (28 TIS). Nucleotide frequency matrices and plots were displayed (left), as well as boxplots showing the nucleotide context score distribution for the respective TIS.
Similar articles
-
Ribosome signatures aid bacterial translation initiation site identification.
Giess A, Jonckheere V, Ndah E, Chyżyńska K, Van Damme P, Valen E. Giess A, et al. BMC Biol. 2017 Aug 30;15(1):76. doi: 10.1186/s12915-017-0416-0. BMC Biol. 2017. PMID: 28854918 Free PMC article.
-
Discovery and revision of Arabidopsis genes by proteogenomics.
Castellana NE, Payne SH, Shen Z, Stanke M, Bafna V, Briggs SP. Castellana NE, et al. Proc Natl Acad Sci U S A. 2008 Dec 30;105(52):21034-8. doi: 10.1073/pnas.0811066106. Epub 2008 Dec 19. Proc Natl Acad Sci U S A. 2008. PMID: 19098097 Free PMC article.
-
Wu HL, Ai Q, Teixeira RT, Nguyen PHT, Song G, Montes C, Elmore JM, Walley JW, Hsu PY. Wu HL, et al. Plant Cell. 2024 Feb 26;36(3):510-539. doi: 10.1093/plcell/koad290. Plant Cell. 2024. PMID: 38000896 Free PMC article.
-
False discovery rate: the Achilles' heel of proteogenomics.
Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. Aggarwal S, et al. Brief Bioinform. 2022 Sep 20;23(5):bbac163. doi: 10.1093/bib/bbac163. Brief Bioinform. 2022. PMID: 35534181 Review.
-
Bacterial riboproteogenomics: the era of N-terminal proteoform existence revealed.
Fijalkowska D, Fijalkowski I, Willems P, Van Damme P. Fijalkowska D, et al. FEMS Microbiol Rev. 2020 Jul 1;44(4):418-431. doi: 10.1093/femsre/fuaa013. FEMS Microbiol Rev. 2020. PMID: 32386204 Review.
Cited by
-
Li YR, Liu MJ. Li YR, et al. Genome Res. 2020 Oct;30(10):1418-1433. doi: 10.1101/gr.261834.120. Epub 2020 Sep 24. Genome Res. 2020. PMID: 32973042 Free PMC article.
-
Xu Z, Hu L, Shi B, Geng S, Xu L, Wang D, Lu ZJ. Xu Z, et al. Nucleic Acids Res. 2018 Oct 12;46(18):e109. doi: 10.1093/nar/gky533. Nucleic Acids Res. 2018. PMID: 29945224 Free PMC article.
-
Brunet MA, Levesque SA, Hunting DJ, Cohen AA, Roucou X. Brunet MA, et al. Genome Res. 2018 May;28(5):609-624. doi: 10.1101/gr.230938.117. Epub 2018 Apr 6. Genome Res. 2018. PMID: 29626081 Free PMC article. Review.
-
Perrar A, Dissmeyer N, Huesgen PF. Perrar A, et al. J Exp Bot. 2019 Apr 12;70(7):2021-2038. doi: 10.1093/jxb/erz104. J Exp Bot. 2019. PMID: 30838411 Free PMC article. Review.
-
CircCode: A Powerful Tool for Identifying circRNA Coding Ability.
Sun P, Li G. Sun P, et al. Front Genet. 2019 Oct 10;10:981. doi: 10.3389/fgene.2019.00981. eCollection 2019. Front Genet. 2019. PMID: 31649739 Free PMC article.
References
-
- Jaffe J. D., Berg H. C., and Church G. M. (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 - PubMed
-
- Zhang B., Wang J., Wang X., Zhu J., Liu Q., Shi Z., Chambers M. C., Zimmerman L. J., Shaddox K. F., Kim S., Davies S. R., Wang S., Wang P., Kinsinger C. R., Rivers R. C., Rodriguez H., Townsend R. R., Ellis M. J., Carr S. A., Tabb D. L., Coffey R. J., Slebos R. J., Liebler D. C., and NCI CPTAC (2014) Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 - PMC - PubMed
-
- Ruggles K. V., Tang Z., Wang X., Grover H., Askenazi M., Teubl J., Cao S., McLellan M. D., Clauser K. R., Tabb D. L., Mertins P., Slebos R., Erdmann-Gilmore P., Li S., Gunawardena H. P., Xie L., Liu T., Zhou J. Y., Sun S., Hoadley K. A., Perou C. M., Chen X., Davies S. R., Maher C. A., Kinsinger C. R., Rodland K. D., Zhang H., Zhang Z., Ding L., Townsend R. R., Rodriguez H., Chan D., Smith R. D., Liebler D. C., Carr S. A., Payne S., Ellis M. J., and Fenyo D. (2016) An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Mol. Cell. Proteomics 15, 1060–1071 - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases