pubmed.ncbi.nlm.nih.gov

An integrated machine-learning model to predict nucleosome architecture - PubMed

️Mon Jan 01 2024

An integrated machine-learning model to predict nucleosome architecture

Alba Sala et al. Nucleic Acids Res. 2024.

Abstract

We demonstrate that nucleosomes placed in the gene body can be accurately located from signal decay theory assuming two emitters located at the beginning and at the end of genes. These generated wave signals can be in phase (leading to well defined nucleosome arrays) or in antiphase (leading to fuzzy nucleosome architectures). We found that the first (+1) and the last (-last) nucleosomes are contiguous to regions signaled by transcription factor binding sites and unusual DNA physical properties that hinder nucleosome wrapping. Based on these analyses, we developed a method that combines Machine Learning and signal transmission theory able to predict the basal locations of the nucleosomes with an accuracy similar to that of experimental MNase-seq based methods.

PubMed Disclaimer

Figures

**Figure 1.**
Average nucleosome coverage (red), TFBS density (yellow) and deformation energy (green) around well positioned +1 and –last nucleosomes for open NFRs (wider than 215 bp): Long TSSs-NFR (panel A, 2749 genes), Long TTSs-NFR (panel B, 1134 genes); and closed NFRs (shorter than 215 bp): Short TSSs-NFR (panel C, 644 genes) and Short TTSs-NFR (panel D, 942 genes).

**Figure 2.**
(A, B) panels showing the receiver operating characteristic (ROC) curve results from our Neural Network for (A) a 350 bp window and (B) different window sizes ROC curves (350 bp in blue, 250 bp in orange and 600 bp in green). NFR prediction (grey) against nucleosome experimental coverage (red) for (C) all TSSs (5676 genes), (D) well defined TSSs (3393 genes), (E) all TTSs (5676 genes) and (F) well defined TTSs (2076 genes). Green lines denote the average prediction of the +1 (in C and D) and –last (in E and F) nucleosomes, 2 stds from a fitted Gaussian distribution (dark blue). Purple lines mark the peak of the NFR probability prediction. Around 19% of the genes were excluded from the analysis given that they were missing the +1 and/or –last nucleosome experimental calls (see Supplementary Table S2).

**Figure 3.**
Scheme of resulting calls using nucleR (top golden boxes) with the coverage coming from the experimental mapping (red profile) against the predicted coverage from our signal theory combined prediction; see Materials and methods (green profile). Note the cell variability detected as multiple peak callings detected by nucleR in the top plot.

**Figure 4.**
Signal decay model of nucleosome positioning. (A) Experimental (left panel) and predicted (right panel) nucleosome coverage for each gene, with respect to the +1 nucleosome. Genes are sorted by the distance between the +1 and the –last nucleosomes. Colour scale corresponds to normalized nucleosome coverage, from 1 (red) to 0 (white). (B) Nucleosome coverage, experimental (black) and predicted (purple, see Materials and methods Eq. 6) from the +1 nucleosome, averaged across all genes. Genes are split into phased or unphased based on DFI < 10 and DFI > 40, respectively. (C) Signals from the +1 (red) and the –last nucleosomes (blue) to predict the experimental coverage (see Materials and methods Eqs. 4–5).

**Figure 5.**
Coverage for highly (713 genes) and lowly (669 genes) expressed genes against a periodic nucleosome repeat length for (A) our experimental method and (B) combined predicted coverage. Autocorrelation scores for highly (red) and lowly (blue) expressed genes derived from (C) MNase-seq experimental data and (D) our coverage from our full prediction (see Materials and methods).

**Figure 6.**
(A) Schematic representation of the experimental design. The exact location of the 81-nt insertion for each of the eight genes is indicated in Table 1 and represented in Supplementary Figures S13 and S14B and C. Nucleosome coverage for the selected genes in the unmodified strain (top panels) and the strain with the 81-nt insertion (bottom panels). Average of (B) the four genes phased in the unmodified strain (UBX5, CKB2, PPT1 and TRP4) (blue line) and (C) the four genes unphased in the unmodified strain (BSP1, DGK1, SLM3 and PAN5) (orange line).

**Figure 7.**
Effect of transcription on nucleosome positioning. (A) Change in the proportion of Fuzzy and Well-positioned nucleosomes upon transcription inhibition, with bars indicating relative standard error. (B) Change in NFRs’ width at the TSS (−1 to +1 nucleosome distance) upon transcription inhibition in the presence of 1,10-phenanthroline (only cases with significant displacements (>20 bp) are considered in the box plots). (C) Mean autocorrelation scores of control (Ctrl) and phenanthroline (Ph) samples for the 4 strains previously described (see Figure 6A and Table 1) and for all the genes having a well-defined +1 and a -last nucleosome.

References

1. Richmond T.J., Davey C.A.. The structure of DNA in the nucleosome core. Nature. 2003; 423:145–150. - PubMed
1. Izzo A., Kamieniarz K., Schneider R.. The histone H1 family: specific members, specific functions?. Bchm. 2008; 389:333–343. - PubMed
1. Yuan G.-C., Liu Y.-J., Dion M.F., Slack M.D., Wu L.F., Altschuler S.J., Rando O.J.. Genome-scale identification of nucleosome positions in S. cerevisiae. Science. 2005; 309:626–630. - PubMed
1. Mavrich T.N., Ioshikhes I.P., Venters B.J., Jiang C., Tomsho L.P., Qi J., Schuster S.C., Albert I., Pugh B.F.. A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Res. 2008; 18:1073–1083. - PMC - PubMed
1. Mavrich T.N., Jiang C., Ioshikhes I.P., Li X., Venters B.J., Zanton S.J., Tomsho L.P., Qi J., Glaser R.L., Schuster S.C.et al. .. Nucleosome organization in the Drosophila genome. Nature. 2008; 453:358–362. - PMC - PubMed

An integrated machine-learning model to predict nucleosome architecture - PubMed