pubmed.ncbi.nlm.nih.gov

RiboNT: A Noise-Tolerant Predictor of Open Reading Frames from Ribosome-Protected Footprints - PubMed

  • ️Fri Jan 01 2021

RiboNT: A Noise-Tolerant Predictor of Open Reading Frames from Ribosome-Protected Footprints

Bo Song et al. Life (Basel). 2021.

Abstract

Ribo-seq, also known as ribosome profiling, refers to the sequencing of ribosome-protected mRNA fragments (RPFs). This technique has greatly advanced our understanding of translation and facilitated the identification of novel open reading frames (ORFs) within untranslated regions or non-coding sequences as well as the identification of non-canonical start codons. However, the widespread application of Ribo-seq has been hindered because obtaining periodic RPFs requires a highly optimized protocol, which may be difficult to achieve, particularly in non-model organisms. Furthermore, the periodic RPFs are too short (28 nt) for accurate mapping to polyploid genomes, but longer RPFs are usually produced with a compromise in periodicity. Here we present RiboNT, a noise-tolerant ORF predictor that can utilize RPFs with poor periodicity. It evaluates RPF periodicity and automatically weighs the support from RPFs and codon usage before combining their contributions to identify translated ORFs. The results demonstrate the utility of RiboNT for identifying both long and small ORFs using RPFs with either good or poor periodicity. We implemented the pipeline on a dataset of RPFs with poor periodicity derived from membrane-bound polysomes of Arabidopsis thaliana seedlings and identified several small ORFs (sORFs) evolutionarily conserved in diverse plant species. RiboNT should greatly broaden the application of Ribo-seq by minimizing the requirement of RPF quality and allowing the use of longer RPFs, which is critical for organisms with complex genomes because these RPFs can be more accurately mapped to the position from which they were derived.

Keywords: ORFs; RPFs; Ribo-seq; periodicity; ribosome profiling; small ORFs.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1

The RiboNT workflow consists six steps. Step 1: assemble transcripts according to the genome annotation information and extract candidate ORFs; step 2: examine the quality of RPFs and filter out low-quality RPFs; step 3: calculate the offsets to the start codon for RPFs in each size; step 4: balance the weight between RPFs and codon usages according to the periodicity of RPFs; step 5: identify translated ORFs from candidate ORFs by combing four student’s t-tests (RPF depth, frame 0 vs. 1, 2; codon usage, frame 0 vs. 1, 2); step 6: classify the predicted ORFs into different classes.

Figure 2
Figure 2

The strategy of RPFs filtering and weighing. (A) The distribution of high-quality RPFs on CDSs. (B) The depth distribution of RPFs shown in (A) is transformed into a wave function by connecting the vertices, with the coordinates on the CDSs converted into a timeline in second. (C) The resulting wave was subjected to a F-test implemented in “multitaper”, an R package. RPFs showing significance (p <= 0.01) at frequency of 0.33 Hz (the period recurs every 3 s) were retained. RPFs without periodicity (DF) do not satisfy these criteria. The horizontal (gray) and vertical (red) dotted lines in (C) and (F) indicate the position of p = 0.01 and Frequency = 0.33 Hz, respectively. (G) The first RPFs shown in CDSs represent the mRNA fragments protected by ribosomes translating the start codon. Their offsets to the start codon were determined by the distance from its 5′ terminus to the translated P-sites. (H) RPFs predominantly located on one of these frames would result in lower overall entropy and will be weighted more (I) in the identification of ORFs. On the other hand, (J) RPFs with weak periodicities have higher overall entropy (K) and will be weighted less (L).

Figure 3
Figure 3

Searching and classification of ORFs. (A) Stepwise search for the longest ORF candidates with a priority to those start with AUG. (B) Classification of identified ORFs according to their positions relative to the annotated ORFs.

Figure 4
Figure 4

Performance of RiboNT, RiboCode and RiboTaper in detecting annotated ORFs. The recall (A), precision (B), F-score (C) and percent support from MS datasets (D) of annotated ORFs identified by RiboNT (purple), RiboCode (cyan) and RiboTaper (orange) in human (AD) and yeast (EH) datasets using RPFs with varying amount of noise (0–90%). The horizontal dotted lines in (D) and (H) indicate the percentage of MS-validated ORFs in the reference genome.

Figure 5
Figure 5

Performance of RiboNT and RiboCode in TIS identification. The recall (A), precision (B) and F-score (C) of TISs identified by RiboNT (purple) and RiboCode (cyan) in human using RPFs with different amounts of noise (0–90%).

Figure 6
Figure 6

Performance of RiboNT, RiboCode and RiboTaper in sORF identification. The recall (A), precision (B), F-score (C) and support percentage of MS datasets (D) of sORFs identified by RiboNT (purple), RiboCode (cyan) and RiboTaper (orange) in the yeast genome using RPFs with varied amount of noise (0–90%). The horizontal dotted line in (D) indicates the percentage of MS-validated sORFs in the reference genome.

Figure 7
Figure 7

Performance of RiboNT, RiboCode, RiboTaper and RiboWave for the identification of annotated ORFs from a noisy dataset of human RPFs. (A) Distribution of RPFs sizes from the Reid et al. dataset with the vertical dotted line indicating the canonical 28 nt RPF size. (B) “Multitaper” test for periodicity of the Reid et al. RPFs (34 nt) indicates poor periodicity. The prediction of annotated ORFs from the Gao et al. (periodic) and Reid et al. (poor periodicity) datasets by (C) RiboNT, (D) RiboTaper, (E) RiboCode and (F) RiboWave.

Figure 8
Figure 8

Implementation of RiboNT on a noisy RPF dataset derived from Arabidopsis membrane-bound polysomes. (A) RPF size distribution from Li et al. peaked at 32 nt with vertical dotted line indicating the canonical 28 nt RPF. (B) “Multitaper” test for periodicity from Li et al. (32 nt) indicates poor periodicity with high entropy (C). (D) RiboNT recovered the majority of the annotated ORFs in the A. thaliana genome with considerable precision, while RiboTaper did not. (E) RiboNT also identified several sORFs including 114 uORFs, 93 ouORFs, 245 dORFs, 232 odORFs and 13 ncsORFs. sORFs were also identified from transposable element and pseudogene transcripts. (F) The percentage of identified ORFs validated by the MS dataset for the varies sORF type; the error bars indicate standard deviation. The horizontal dotted line indicates the percentage of MS-validated ORFs in the reference genome. (G) The ncsORFs identified from this dataset can be categorized into three groups according to their sequence conservations across plant genomes. The ncsORFs in group 1 are specific to A. thaliana while those in group 2 are family conserved and those in group 3 are conserved from ferns to monocots and eudicots. The scale bar indicates sequence similarity.

Similar articles

Cited by

References

    1. Kastenmayer J.P., Ni L., Chu A., Kitchen L.E., Au W.-C., Yang H., Carter C.D., Wheeler D., Davis R.W., Boeke J.D., et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006;16:365–373. doi: 10.1101/gr.4355406. - DOI - PMC - PubMed
    1. Gao X., Wan J., Liu B., Ma M., Shen B., Qian S.-B. Quantitative profiling of initiating ribosomes in vivo. Nat. Methods. 2015;12:147–153. doi: 10.1038/nmeth.3208. - DOI - PMC - PubMed
    1. Spealman P., Naik A.W., May G.E., Kuersten S., Freeberg L., Murphy R.F., McManus J. Conserved non-AUG uORFs revealed by a novel regression analysis of ribosome profiling data. Genome Res. 2018;28:214–222. doi: 10.1101/gr.221507.117. - DOI - PMC - PubMed
    1. Huh W.-K., Falvo J.V., Gerke L.C., Carroll A.S., Howson R.W., Weissman J.S., O’Shea E.K. Global analysis of protein localization in budding yeast. Nat. Cell Biol. 2003;425:686–691. doi: 10.1038/nature02026. - DOI - PubMed
    1. Hayden C.A., Jorgensen R.A. Identification of novel conserved peptide uORF homology groups in Arabidopsis and rice reveals ancient eukaryotic origin of select groups and preferential association with transcription factor-encoding genes. BMC Biol. 2007;5:1–30. doi: 10.1186/1741-7007-5-32. - DOI - PMC - PubMed