A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites - PubMed
. 2014 Dec;14(23-24):2688-98.
doi: 10.1002/pmic.201400180. Epub 2014 Oct 2.
Daria Gawron, Sandra Steyaert, Elvis Ndah, Jeroen Crappé, Sarah De Keulenaer, Ellen De Meester, Ming Ma, Ben Shen, Kris Gevaert, Wim Van Criekinge, Petra Van Damme, Gerben Menschaert
Affiliations
- PMID: 25156699
- PMCID: PMC4391000
- DOI: 10.1002/pmic.201400180
A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites
Alexander Koch et al. Proteomics. 2014 Dec.
Abstract
Next-generation transcriptome sequencing is increasingly integrated with MS to enhance MS-based protein and peptide identification. Recently, a breakthrough in transcriptome analysis was achieved with the development of ribosome profiling (ribo-seq). This technology is based on the deep sequencing of ribosome-protected mRNA fragments, thereby enabling the direct observation of in vivo protein synthesis at the transcript level. In order to explore the impact of a ribo-seq-derived protein sequence search space on MS/MS spectrum identification, we performed a comprehensive proteome study on a human cancer cell line, using both shotgun and N-terminal proteomics, next to ribosome profiling, which was used to delineate (alternative) translational reading frames. By including protein-level evidence of sample-specific genetic variation and alternative translation, this strategy improved the identification score of 69 proteins and identified 22 new proteins in the shotgun experiment. Furthermore, we discovered 18 new alternative translation start sites in the N-terminal proteomics data and observed a correlation between the quantitative measures of ribo-seq and shotgun proteomics with a Pearson correlation coefficient ranging from 0.483 to 0.664. Overall, this study demonstrated the benefits of ribosome profiling for MS-based protein and peptide identification and we believe this approach could develop into a common practice for next-generation proteomics.
Keywords: Bioinformatics; N-terminomics; Proteogenomics; Ribosome profiling; Translation initiation.
© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Figures

Ribo-seq was performed twice on the human colon cancer cell line HCT116, once with CHX to halt translation globally and once with LTM to stop translation specifically at translation initiation sites. After translation initiation site (TIS) prediction, the ribo-seq-derived ORFs were translated to create a custom protein sequence database. This database was then combined with the human Swiss-Prot protein sequence database. Proteome samples were prepared from the same HCT116 cells and analyzed using both shotgun proteomics and N-terminal COFRADIC. The proteins and peptides in these samples were then identified using the custom combined protein search space.

a) Shotgun proteomics. The custom combined protein sequence database resulted in the identification of 2,816 proteins. Most of these proteins (2,482 or 88.1%) were picked up by both databases independently, while 312 and 22 proteins were uniquely identified in the Swiss-Prot and ribo-seq databases respectively. The 22 unique ribo-seq identifications contained six new proteins, 13 proteins with a mutation site and 3 unannotated isoforms. The ribo-seq data also improved the protein identification and score of 69 proteins. b) N-terminal COFRADIC. Most of the 1,289 peptides that were found in the custom combined protein sequence database mapped to canonical, annotated N-termini (1,071 dbTIS peptides or 83.1%). Of the remaining N-termini, 208 started downstream of the canonical start site (beyond protein position 2), 9 mapped to a 5′-extension and one to an uORF. For both the up- and downstream start sites, we identified several near-cognate start sites.

The figure shows a 5′-extension (Swiss-Prot entry name RBP2_HUMAN) and an N-terminal truncation (Swiss-Prot entry name HNRPL_HUMAN). The UCSC genome browser [57] was used to create the plots of the ribo-seq and N-terminal COFRADIC data and the different browser tracks are from top to bottom: CHX treatment data, LTM treatment data, N-terminal COFRADIC data, UCSC genes, RefSeq genes and human mRNA from GenBank. The different start sites (a: alternative start site, b: canonical start site) are clearly visible in the zoomed genome browser views, just as the three-nucleotide periodicity of the ribo-seq data, especially in the N-terminal truncation image. The MS/MS spectra and sequence fragmentations indicate the confidence and quality of the peptide identifications.

a) Correlation plots of protein abundance estimates based on NSAF values and RPF counts. Top left: all dbTIS transcripts; top right: dbTIS transcripts with a validated MS/MS-based identification (i.e. transcripts with a spectral count value > 2); bottom left: dbTIS transcripts with an RPF count ≥ 200; bottom right: dbTIS transcripts with both a validated MS identification and an RPF count ≥ 200. The regression line is shown in green. For each plot, the number of data points used (i.e. the number of dbTIS transcripts) as well as the corresponding Pearson correlation coefficient (r2) is shown. b) Correlation plot with the inclusion of stability data. Only dbTIS transcripts with both a validated MS/MS-based identification and an RPF count ≥ 200 were used (bottom right plot in Figure 4a). Instability indexes were determined with the ProtParam tool [45]: proteins with an instability index < 40 were classified as stable and are shown in blue, whereas proteins with an instability index ≥ 40 were classified as unstable and are shown in orange.
Similar articles
-
Menschaert G, Van Criekinge W, Notelaers T, Koch A, Crappé J, Gevaert K, Van Damme P. Menschaert G, et al. Mol Cell Proteomics. 2013 Jul;12(7):1780-90. doi: 10.1074/mcp.M113.027540. Epub 2013 Feb 21. Mol Cell Proteomics. 2013. PMID: 23429522 Free PMC article.
-
PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.
Crappé J, Ndah E, Koch A, Steyaert S, Gawron D, De Keulenaer S, De Meester E, De Meyer T, Van Criekinge W, Van Damme P, Menschaert G. Crappé J, et al. Nucleic Acids Res. 2015 Mar 11;43(5):e29. doi: 10.1093/nar/gku1283. Epub 2014 Dec 15. Nucleic Acids Res. 2015. PMID: 25510491 Free PMC article.
-
Vieira de Souza E, L Bookout A, Barnes CA, Miller B, Machado P, Basso LA, Bizarro CV, Saghatelian A. Vieira de Souza E, et al. Nat Commun. 2024 Aug 9;15(1):6839. doi: 10.1038/s41467-024-50301-4. Nat Commun. 2024. PMID: 39122697 Free PMC article.
-
The proteome under translational control.
Gawron D, Gevaert K, Van Damme P. Gawron D, et al. Proteomics. 2014 Dec;14(23-24):2647-62. doi: 10.1002/pmic.201400165. Epub 2014 Nov 2. Proteomics. 2014. PMID: 25263132 Review.
-
Identification of Small Novel Coding Sequences, a Proteogenomics Endeavor.
Olexiouk V, Menschaert G. Olexiouk V, et al. Adv Exp Med Biol. 2016;926:49-64. doi: 10.1007/978-3-319-42316-6_4. Adv Exp Med Biol. 2016. PMID: 27686805 Review.
Cited by
-
Nascent alt-protein chemoproteomics reveals a pre-60S assembly checkpoint inhibitor.
Cao X, Khitun A, Harold CM, Bryant CJ, Zheng SJ, Baserga SJ, Slavoff SA. Cao X, et al. Nat Chem Biol. 2022 Jun;18(6):643-651. doi: 10.1038/s41589-022-01003-9. Epub 2022 Apr 7. Nat Chem Biol. 2022. PMID: 35393574 Free PMC article.
-
Exploring Ribosome-Positioning on Translating Transcripts with Ribosome Profiling.
Cope AL, Vellappan S, Favate JS, Skalenko KS, Yadavalli SS, Shah P. Cope AL, et al. Methods Mol Biol. 2022;2404:83-110. doi: 10.1007/978-1-0716-1851-6_5. Methods Mol Biol. 2022. PMID: 34694605
-
The small peptide world in long noncoding RNAs.
Choi SW, Kim HW, Nam JW. Choi SW, et al. Brief Bioinform. 2019 Sep 27;20(5):1853-1864. doi: 10.1093/bib/bby055. Brief Bioinform. 2019. PMID: 30010717 Free PMC article. Review.
-
Jin J, Meng L, Chen K, Xu Y, Lu P, Li Z, Tao J, Li Z, Wang C, Yang X, Yu S, Yang Z, Cao L, Cao P. Jin J, et al. Front Plant Sci. 2022 Sep 23;13:971400. doi: 10.3389/fpls.2022.971400. eCollection 2022. Front Plant Sci. 2022. PMID: 36212334 Free PMC article.
-
Leong AZ, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Leong AZ, et al. J Biomed Sci. 2022 Mar 17;29(1):19. doi: 10.1186/s12929-022-00802-5. J Biomed Sci. 2022. PMID: 35300685 Free PMC article. Review.
References
-
- Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
-
- Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
-
- Geer LY, Markey SP, Kowalak JA, Wagner L, et al. Open mass spectrometry search algorithm. Journal of proteome research. 2004;3:958–964. - PubMed
-
- Staes A, Impens F, Van Damme P, Ruttens B, et al. Selecting protein N-terminal peptides by combined fractional diagonal chromatography. Nature protocols. 2011;6:1130–1141. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases