pubmed.ncbi.nlm.nih.gov

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites - PubMed

. 2014 Dec;14(23-24):2688-98.

doi: 10.1002/pmic.201400180. Epub 2014 Oct 2.

Daria Gawron, Sandra Steyaert, Elvis Ndah, Jeroen Crappé, Sarah De Keulenaer, Ellen De Meester, Ming Ma, Ben Shen, Kris Gevaert, Wim Van Criekinge, Petra Van Damme, Gerben Menschaert

Affiliations

PMID: 25156699
PMCID: PMC4391000
DOI: 10.1002/pmic.201400180

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites

Alexander Koch et al. Proteomics. 2014 Dec.

Abstract

Next-generation transcriptome sequencing is increasingly integrated with MS to enhance MS-based protein and peptide identification. Recently, a breakthrough in transcriptome analysis was achieved with the development of ribosome profiling (ribo-seq). This technology is based on the deep sequencing of ribosome-protected mRNA fragments, thereby enabling the direct observation of in vivo protein synthesis at the transcript level. In order to explore the impact of a ribo-seq-derived protein sequence search space on MS/MS spectrum identification, we performed a comprehensive proteome study on a human cancer cell line, using both shotgun and N-terminal proteomics, next to ribosome profiling, which was used to delineate (alternative) translational reading frames. By including protein-level evidence of sample-specific genetic variation and alternative translation, this strategy improved the identification score of 69 proteins and identified 22 new proteins in the shotgun experiment. Furthermore, we discovered 18 new alternative translation start sites in the N-terminal proteomics data and observed a correlation between the quantitative measures of ribo-seq and shotgun proteomics with a Pearson correlation coefficient ranging from 0.483 to 0.664. Overall, this study demonstrated the benefits of ribosome profiling for MS-based protein and peptide identification and we believe this approach could develop into a common practice for next-generation proteomics.

Keywords: Bioinformatics; N-terminomics; Proteogenomics; Ribosome profiling; Translation initiation.

PubMed Disclaimer

Figures

**Figure 1. Proteogenomic strategy for the identification of proteins and peptides using a Swiss-Prot/ribo-seq-derived database**
Ribo-seq was performed twice on the human colon cancer cell line HCT116, once with CHX to halt translation globally and once with LTM to stop translation specifically at translation initiation sites. After translation initiation site (TIS) prediction, the ribo-seq-derived ORFs were translated to create a custom protein sequence database. This database was then combined with the human Swiss-Prot protein sequence database. Proteome samples were prepared from the same HCT116 cells and analyzed using both shotgun proteomics and N-terminal COFRADIC. The proteins and peptides in these samples were then identified using the custom combined protein search space.

**Figure 2. Bar charts showing the number of protein and peptide identifications obtained from the shotgun proteomics and N-terminal COFRADIC experiments**
**a) Shotgun proteomics.** The custom combined protein sequence database resulted in the identification of 2,816 proteins. Most of these proteins (2,482 or 88.1%) were picked up by both databases independently, while 312 and 22 proteins were uniquely identified in the Swiss-Prot and ribo-seq databases respectively. The 22 unique ribo-seq identifications contained six new proteins, 13 proteins with a mutation site and 3 unannotated isoforms. The ribo-seq data also improved the protein identification and score of 69 proteins. **b) N-terminal COFRADIC.** Most of the 1,289 peptides that were found in the custom combined protein sequence database mapped to canonical, annotated N-termini (1,071 dbTIS peptides or 83.1%). Of the remaining N-termini, 208 started downstream of the canonical start site (beyond protein position 2), 9 mapped to a 5′-extension and one to an uORF. For both the up- and downstream start sites, we identified several near-cognate start sites.

**Figure 3. Depiction of two different N-termini that were predicted by ribo-seq and identified using N-terminal COFRADIC**
The figure shows a 5′-extension (Swiss-Prot entry name RBP2_HUMAN) and an N-terminal truncation (Swiss-Prot entry name HNRPL_HUMAN). The UCSC genome browser [57] was used to create the plots of the ribo-seq and N-terminal COFRADIC data and the different browser tracks are from top to bottom: CHX treatment data, LTM treatment data, N-terminal COFRADIC data, UCSC genes, RefSeq genes and human mRNA from GenBank. The different start sites (a: alternative start site, b: canonical start site) are clearly visible in the zoomed genome browser views, just as the three-nucleotide periodicity of the ribo-seq data, especially in the N-terminal truncation image. The MS/MS spectra and sequence fragmentations indicate the confidence and quality of the peptide identifications.

**Figure 4**
**a) Correlation plots of protein abundance estimates based on NSAF values and RPF counts.** Top left: all dbTIS transcripts; top right: dbTIS transcripts with a validated MS/MS-based identification (i.e. transcripts with a spectral count value > 2); bottom left: dbTIS transcripts with an RPF count ≥ 200; bottom right: dbTIS transcripts with both a validated MS identification and an RPF count ≥ 200. The regression line is shown in green. For each plot, the number of data points used (i.e. the number of dbTIS transcripts) as well as the corresponding Pearson correlation coefficient (r²) is shown. **b) Correlation plot with the inclusion of stability data.** Only dbTIS transcripts with both a validated MS/MS-based identification and an RPF count ≥ 200 were used (bottom right plot in Figure 4a). Instability indexes were determined with the ProtParam tool [45]: proteins with an instability index < 40 were classified as stable and are shown in blue, whereas proteins with an instability index ≥ 40 were classified as unstable and are shown in orange.

Cited by

Nascent alt-protein chemoproteomics reveals a pre-60S assembly checkpoint inhibitor.
Cao X, Khitun A, Harold CM, Bryant CJ, Zheng SJ, Baserga SJ, Slavoff SA. Cao X, et al. Nat Chem Biol. 2022 Jun;18(6):643-651. doi: 10.1038/s41589-022-01003-9. Epub 2022 Apr 7. Nat Chem Biol. 2022. PMID: 35393574 Free PMC article.
Exploring Ribosome-Positioning on Translating Transcripts with Ribosome Profiling.
Cope AL, Vellappan S, Favate JS, Skalenko KS, Yadavalli SS, Shah P. Cope AL, et al. Methods Mol Biol. 2022;2404:83-110. doi: 10.1007/978-1-0716-1851-6_5. Methods Mol Biol. 2022. PMID: 34694605
The small peptide world in long noncoding RNAs.
Choi SW, Kim HW, Nam JW. Choi SW, et al. Brief Bioinform. 2019 Sep 27;20(5):1853-1864. doi: 10.1093/bib/bby055. Brief Bioinform. 2019. PMID: 30010717 Free PMC article. Review.
Analysis of herbivore-responsive long noncoding ribonucleic acids reveals a subset of small peptide-coding transcripts in Nicotiana tabacum.
Jin J, Meng L, Chen K, Xu Y, Lu P, Li Z, Tao J, Li Z, Wang C, Yang X, Yu S, Yang Z, Cao L, Cao P. Jin J, et al. Front Plant Sci. 2022 Sep 23;13:971400. doi: 10.3389/fpls.2022.971400. eCollection 2022. Front Plant Sci. 2022. PMID: 36212334 Free PMC article.
Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures.
Leong AZ, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Leong AZ, et al. J Biomed Sci. 2022 Mar 17;29(1):19. doi: 10.1186/s12929-022-00802-5. J Biomed Sci. 2022. PMID: 35300685 Free PMC article. Review.

References

1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
1. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
1. Geer LY, Markey SP, Kowalak JA, Wagner L, et al. Open mass spectrometry search algorithm. Journal of proteome research. 2004;3:958–964. - PubMed
1. Staes A, Impens F, Van Damme P, Ruttens B, et al. Selecting protein N-terminal peptides by combined fractional diagonal chromatography. Nature protocols. 2011;6:1130–1141. - PubMed
1. Nagaraj N, Wisniewski JR, Geiger T, Cox J, et al. Deep proteome and transcriptome mapping of a human cancer cell line. Molecular systems biology. 2011;7:548. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites - PubMed

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases