pubmed.ncbi.nlm.nih.gov

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites - PubMed

. 2014 Dec;14(23-24):2688-98.

doi: 10.1002/pmic.201400180. Epub 2014 Oct 2.

Affiliations

A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites

Alexander Koch et al. Proteomics. 2014 Dec.

Abstract

Next-generation transcriptome sequencing is increasingly integrated with MS to enhance MS-based protein and peptide identification. Recently, a breakthrough in transcriptome analysis was achieved with the development of ribosome profiling (ribo-seq). This technology is based on the deep sequencing of ribosome-protected mRNA fragments, thereby enabling the direct observation of in vivo protein synthesis at the transcript level. In order to explore the impact of a ribo-seq-derived protein sequence search space on MS/MS spectrum identification, we performed a comprehensive proteome study on a human cancer cell line, using both shotgun and N-terminal proteomics, next to ribosome profiling, which was used to delineate (alternative) translational reading frames. By including protein-level evidence of sample-specific genetic variation and alternative translation, this strategy improved the identification score of 69 proteins and identified 22 new proteins in the shotgun experiment. Furthermore, we discovered 18 new alternative translation start sites in the N-terminal proteomics data and observed a correlation between the quantitative measures of ribo-seq and shotgun proteomics with a Pearson correlation coefficient ranging from 0.483 to 0.664. Overall, this study demonstrated the benefits of ribosome profiling for MS-based protein and peptide identification and we believe this approach could develop into a common practice for next-generation proteomics.

Keywords: Bioinformatics; N-terminomics; Proteogenomics; Ribosome profiling; Translation initiation.

© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Proteogenomic strategy for the identification of proteins and peptides using a Swiss-Prot/ribo-seq-derived database

Ribo-seq was performed twice on the human colon cancer cell line HCT116, once with CHX to halt translation globally and once with LTM to stop translation specifically at translation initiation sites. After translation initiation site (TIS) prediction, the ribo-seq-derived ORFs were translated to create a custom protein sequence database. This database was then combined with the human Swiss-Prot protein sequence database. Proteome samples were prepared from the same HCT116 cells and analyzed using both shotgun proteomics and N-terminal COFRADIC. The proteins and peptides in these samples were then identified using the custom combined protein search space.

Figure 2
Figure 2. Bar charts showing the number of protein and peptide identifications obtained from the shotgun proteomics and N-terminal COFRADIC experiments

a) Shotgun proteomics. The custom combined protein sequence database resulted in the identification of 2,816 proteins. Most of these proteins (2,482 or 88.1%) were picked up by both databases independently, while 312 and 22 proteins were uniquely identified in the Swiss-Prot and ribo-seq databases respectively. The 22 unique ribo-seq identifications contained six new proteins, 13 proteins with a mutation site and 3 unannotated isoforms. The ribo-seq data also improved the protein identification and score of 69 proteins. b) N-terminal COFRADIC. Most of the 1,289 peptides that were found in the custom combined protein sequence database mapped to canonical, annotated N-termini (1,071 dbTIS peptides or 83.1%). Of the remaining N-termini, 208 started downstream of the canonical start site (beyond protein position 2), 9 mapped to a 5′-extension and one to an uORF. For both the up- and downstream start sites, we identified several near-cognate start sites.

Figure 3
Figure 3. Depiction of two different N-termini that were predicted by ribo-seq and identified using N-terminal COFRADIC

The figure shows a 5′-extension (Swiss-Prot entry name RBP2_HUMAN) and an N-terminal truncation (Swiss-Prot entry name HNRPL_HUMAN). The UCSC genome browser [57] was used to create the plots of the ribo-seq and N-terminal COFRADIC data and the different browser tracks are from top to bottom: CHX treatment data, LTM treatment data, N-terminal COFRADIC data, UCSC genes, RefSeq genes and human mRNA from GenBank. The different start sites (a: alternative start site, b: canonical start site) are clearly visible in the zoomed genome browser views, just as the three-nucleotide periodicity of the ribo-seq data, especially in the N-terminal truncation image. The MS/MS spectra and sequence fragmentations indicate the confidence and quality of the peptide identifications.

Figure 4
Figure 4

a) Correlation plots of protein abundance estimates based on NSAF values and RPF counts. Top left: all dbTIS transcripts; top right: dbTIS transcripts with a validated MS/MS-based identification (i.e. transcripts with a spectral count value > 2); bottom left: dbTIS transcripts with an RPF count ≥ 200; bottom right: dbTIS transcripts with both a validated MS identification and an RPF count ≥ 200. The regression line is shown in green. For each plot, the number of data points used (i.e. the number of dbTIS transcripts) as well as the corresponding Pearson correlation coefficient (r2) is shown. b) Correlation plot with the inclusion of stability data. Only dbTIS transcripts with both a validated MS/MS-based identification and an RPF count ≥ 200 were used (bottom right plot in Figure 4a). Instability indexes were determined with the ProtParam tool [45]: proteins with an instability index < 40 were classified as stable and are shown in blue, whereas proteins with an instability index ≥ 40 were classified as unstable and are shown in orange.

Similar articles

Cited by

References

    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
    1. Geer LY, Markey SP, Kowalak JA, Wagner L, et al. Open mass spectrometry search algorithm. Journal of proteome research. 2004;3:958–964. - PubMed
    1. Staes A, Impens F, Van Damme P, Ruttens B, et al. Selecting protein N-terminal peptides by combined fractional diagonal chromatography. Nature protocols. 2011;6:1130–1141. - PubMed
    1. Nagaraj N, Wisniewski JR, Geiger T, Cox J, et al. Deep proteome and transcriptome mapping of a human cancer cell line. Molecular systems biology. 2011;7:548. - PMC - PubMed

MeSH terms

Substances