pubmed.ncbi.nlm.nih.gov

An update on sORFs.org: a repository of small ORFs identified by ribosome profiling - PubMed

  • ️Mon Jan 01 2018

An update on sORFs.org: a repository of small ORFs identified by ribosome profiling

Volodimir Olexiouk et al. Nucleic Acids Res. 2018.

Abstract

sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.

© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.

An overview of the most important improvements to sORFs.org since its initial release. The modified TIS-calling pipeline together with the noise filtering algorithm enabled the inclusion of datasets on additional species, wherefore no initiating RIBO-seq data (LTM or HAR treated) was available. Currently, a total of 78 RIBO-seq datasets are processed, identifying numerous novel sORFs with ribosome occupancy. Implementation of the inner-BLAST pipeline revealed sORFs with sequence similarity identified in multiple species and the PRIDE-ReSpin pipeline provides an extra layer of translation evidence based on MS data for a plethora of sORFs.

Figure 2.
Figure 2.

Visual representation of the noise filtering algorithm. The transcript of the sORF is reconstructed into a binary array, where ‘1’ represent positions covered by ribosome P-site and ‘0’ uncovered. This array is then shuffled 10 000 times, each iteration calculates the in-frame coverage in the sORF region, shaping a distribution of shuffled in-frame coverage as represented in gray. Next, the probability of sampling a value equal or greater than the actual in-frame coverage of the sORF is calculated (represented in red).

Figure 3.
Figure 3.

General overview of the PRIDE-ReSpin pipeline. First, MS-based proteomics experiments are downloaded from the PRIDE public repository. Next, a reverse engineering mechanism based on PRIDE-ASAP and Pladipus extracts the database (DB) search parameters for that study. These are inputted into the searchGUI search engine management software, launching a DB search against a concatenated database consisting of the UniProt reference proteome, the cRAP database and the sORFs.org database, using the X!Tandem and MS-GF+ as search engines. Consecutively, the output is imported into PeptideShaker to validate and export identified peptides at an FDR of 1%, with a minimum of 30% spectrum coverage and no PSMs having a higher confidence to non sORF peptides. These resulting peptides are then imported into sORFs.org for visualization in the Lorikeet MS/MS browser.

Similar articles

Cited by

References

    1. Pauli A., Valen E., Schier A.F.. Identifying (non-)coding RNAs and small peptides: challenges and opportunities. Bioessays. 2015; 37:103–112. - PMC - PubMed
    1. Makarewich C.A., Olson E.N.. Mining for micropeptides. Trends Cell Biol. 2017; doi:10.1016/j.tcb.2017.04.006. - PMC - PubMed
    1. Couso J.-P., Patraquim P.. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 2017; doi:10.1038/nrm.2017.58. - PubMed
    1. Saghatelian A., Couso J.P.. Discovery and characterization of smORF-encoded bioactive polypeptides. Nat. Chem. Biol. 2015; 11:909–916. - PMC - PubMed
    1. Olexiouk V., Menschaert G.. Identification of small novel coding sequences, a proteogenomics endeavor. Advances in Experimental Medicine and Biology. 2016; 926:49–64. - PubMed

Publication types

MeSH terms