An update on sORFs.org: a repository of small ORFs identified by ribosome profiling - PubMed
- ️Mon Jan 01 2018
An update on sORFs.org: a repository of small ORFs identified by ribosome profiling
Volodimir Olexiouk et al. Nucleic Acids Res. 2018.
Abstract
sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures

An overview of the most important improvements to sORFs.org since its initial release. The modified TIS-calling pipeline together with the noise filtering algorithm enabled the inclusion of datasets on additional species, wherefore no initiating RIBO-seq data (LTM or HAR treated) was available. Currently, a total of 78 RIBO-seq datasets are processed, identifying numerous novel sORFs with ribosome occupancy. Implementation of the inner-BLAST pipeline revealed sORFs with sequence similarity identified in multiple species and the PRIDE-ReSpin pipeline provides an extra layer of translation evidence based on MS data for a plethora of sORFs.

Visual representation of the noise filtering algorithm. The transcript of the sORF is reconstructed into a binary array, where ‘1’ represent positions covered by ribosome P-site and ‘0’ uncovered. This array is then shuffled 10 000 times, each iteration calculates the in-frame coverage in the sORF region, shaping a distribution of shuffled in-frame coverage as represented in gray. Next, the probability of sampling a value equal or greater than the actual in-frame coverage of the sORF is calculated (represented in red).

General overview of the PRIDE-ReSpin pipeline. First, MS-based proteomics experiments are downloaded from the PRIDE public repository. Next, a reverse engineering mechanism based on PRIDE-ASAP and Pladipus extracts the database (DB) search parameters for that study. These are inputted into the searchGUI search engine management software, launching a DB search against a concatenated database consisting of the UniProt reference proteome, the cRAP database and the sORFs.org database, using the X!Tandem and MS-GF+ as search engines. Consecutively, the output is imported into PeptideShaker to validate and export identified peptides at an FDR of 1%, with a minimum of 30% spectrum coverage and no PSMs having a higher confidence to non sORF peptides. These resulting peptides are then imported into sORFs.org for visualization in the Lorikeet MS/MS browser.
Similar articles
-
sORFs.org: a repository of small ORFs identified by ribosome profiling.
Olexiouk V, Crappé J, Verbruggen S, Verhegen K, Martens L, Menschaert G. Olexiouk V, et al. Nucleic Acids Res. 2016 Jan 4;44(D1):D324-9. doi: 10.1093/nar/gkv1175. Epub 2015 Nov 2. Nucleic Acids Res. 2016. PMID: 26527729 Free PMC article.
-
Olexiouk V, Menschaert G. Olexiouk V, et al. Curr Protoc Bioinformatics. 2019 Mar;65(1):e68. doi: 10.1002/cpbi.68. Epub 2018 Nov 28. Curr Protoc Bioinformatics. 2019. PMID: 30485709
-
Perdikopanis N, Giannakakis A, Kavakiotis I, Hatzigeorgiou AG. Perdikopanis N, et al. Biology (Basel). 2024 Jul 26;13(8):563. doi: 10.3390/biology13080563. Biology (Basel). 2024. PMID: 39194501 Free PMC article.
-
Leong AZ, Lee PY, Mohtar MA, Syafruddin SE, Pung YF, Low TY. Leong AZ, et al. J Biomed Sci. 2022 Mar 17;29(1):19. doi: 10.1186/s12929-022-00802-5. J Biomed Sci. 2022. PMID: 35300685 Free PMC article. Review.
-
Evolution of new proteins from translated sORFs in long non-coding RNAs.
Ruiz-Orera J, Villanueva-Cañas JL, Albà MM. Ruiz-Orera J, et al. Exp Cell Res. 2020 Jun 1;391(1):111940. doi: 10.1016/j.yexcr.2020.111940. Epub 2020 Mar 7. Exp Cell Res. 2020. PMID: 32156600 Review.
Cited by
-
LncSEA: a platform for long non-coding RNA related sets and enrichment analysis.
Chen J, Zhang J, Gao Y, Li Y, Feng C, Song C, Ning Z, Zhou X, Zhao J, Feng M, Zhang Y, Wei L, Pan Q, Jiang Y, Qian F, Han J, Yang Y, Wang Q, Li C. Chen J, et al. Nucleic Acids Res. 2021 Jan 8;49(D1):D969-D980. doi: 10.1093/nar/gkaa806. Nucleic Acids Res. 2021. PMID: 33045741 Free PMC article.
-
Anders J, Petruschke H, Jehmlich N, Haange SB, von Bergen M, Stadler PF. Anders J, et al. BMC Bioinformatics. 2021 May 26;22(1):277. doi: 10.1186/s12859-021-04159-8. BMC Bioinformatics. 2021. PMID: 34039272 Free PMC article.
-
Liu Q, Peng X, Shen M, Qian Q, Xing J, Li C, Gregory RI. Liu Q, et al. Nucleic Acids Res. 2023 Jan 6;51(D1):D248-D261. doi: 10.1093/nar/gkac1094. Nucleic Acids Res. 2023. PMID: 36440758 Free PMC article.
-
Accurate annotation of human protein-coding small open reading frames.
Martinez TF, Chu Q, Donaldson C, Tan D, Shokhirev MN, Saghatelian A. Martinez TF, et al. Nat Chem Biol. 2020 Apr;16(4):458-468. doi: 10.1038/s41589-019-0425-0. Epub 2019 Dec 9. Nat Chem Biol. 2020. PMID: 31819274 Free PMC article.
-
MiPepid: MicroPeptide identification tool using machine learning.
Zhu M, Gribskov M. Zhu M, et al. BMC Bioinformatics. 2019 Nov 8;20(1):559. doi: 10.1186/s12859-019-3033-9. BMC Bioinformatics. 2019. PMID: 31703551 Free PMC article.
References
-
- Couso J.-P., Patraquim P.. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 2017; doi:10.1038/nrm.2017.58. - PubMed
-
- Olexiouk V., Menschaert G.. Identification of small novel coding sequences, a proteogenomics endeavor. Advances in Experimental Medicine and Biology. 2016; 926:49–64. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials