pubmed.ncbi.nlm.nih.gov

The three-dimensional genome organization of Drosophila melanogaster through data integration - PubMed

  • ️Sun Jan 01 2017

The three-dimensional genome organization of Drosophila melanogaster through data integration

Qingjiao Li et al. Genome Biol. 2017.

Abstract

Background: Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments.

Results: Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data.

Conclusions: Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.

Keywords: 3D genome structure; Data integration; Drosophila melanogaster; Heterochromatin; Hi-C; Higher order genome organization; Homologous pairing; Lamina-DamID; Population-based modeling.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1

Overview of the population-based genome structure modeling approach and its application to the Drosophila genome. a The initial structures are random configurations. Maximum likelihood optimization is achieved through an iterative process with two steps, assignment (A) and modeling (M). We increase the optimization hardness over several stages by including contacts from the Hi-C matrix A with lower probability thresholds (θ). After the population reproduces the complete Hi-C data, we include the vector E (lamina-DamID), again in stages with decreasing contact probability thresholds (λ). b Schematic of the Drosophila genome. The autosome arms are designated 2L, 2R, 3L, 3R, 4, and X. The arms of chr2 and chr3 are connected by centromeres labeled “C”. Euchromatic regions are labeled as the arm. The numbers along the top of a genome indicate the length of the section in megabases (Mb), and for euchromatin the number of spheres (TADs) in the structure model is also given. The heterochromatic region of each chromosome arm is labeled “H”. The white gene is located ~19 M away from the heterochromatin of chrX. Also indicated are the Hox genes: five genes of the Antennapedia complex (ANT-C) are located at ~2.3–2.8 Mb from the heterochromatin of chr3R, and three genes of the Bithorax complex (BX-C) are located at ~12.4–12.7 Mb from the heterochromatin of chr3R. c Snapshot of a single structure randomly picked from the final population. Left panel: The full diploid chromosomes are shown in colors: blue, chr2; green, chr3; magenta, chr4; orange, chrX. The two homologs of the same chromosome are distinguished by the color tone, with one homolog copy with lighter and one with darker color. The heterochromatin spheres are larger than the euchromatin domains. The nucleolus is colored in silver. Right panel: The euchromatin domains are colored to reflect their epigenetic class: red, active; blue, PcG; green, HP1; dark purple, null. Heterochromatin spheres are shown in grey and the nucleolus in pink

Fig. 2
Fig. 2

Reproduction of Hi-C and lamina-DamID data. a Heat maps of intra-arm contact probabilities from Hi-C experiments (left) and intra-arm contact frequencies from the structure population (right). Their similarity is quantified by element-wise Pearson’s correlations, which are 0.984, 0.985, 0.984, 0.986, and 0.980 for chr2L, chr2R, chr3L, chr3R, and chrX, respectively. The maps only show interactions with probabilities no less than 6%, which are used as constraints in our modeling procedure. We set the darkest color for probability = 0.2 and above to avoid making regions away from the diagonal (long range interactions) too weak and blank for comparison. b Agreement between the experimental data and model contact probabilities. Left panel: The input Hi-C contact probabilities are divided into 100 bins, the corresponding model contact probabilities in one bin are summarized by mean and variance, and then the error bar plot is shown. The blue dashed line is the linear regression line between the average model contact probabilities of each bin and the mid-point Hi-C contact probabilities of the bins. Their Pearson’s correlation is 0.998 with p value <2.2e − 16. Right panel: Close-up of the agreement between experiment and model for contacts with probabilities less than 6%, which are not used as constraints in our modeling procedure. In this range, the Pearson’s correlation is 0.907 with p value = 4.87e − 3. c The agreement between NE association frequencies from lamina-DamID experiments and the model population. This figure is plotted in the same way as b. The structure population well reproduces the input frequencies derived from lamina-DamID data, with a Pearson’s correlation of 0.95 and p value <2.2e − 16. d Comparison of experimental and model lamina-DamID frequencies on chrX. The top panel shows the input frequencies derived from the lamina-DamID signal, the middle panel shows the fraction of domains located at the NE in the structure population obtained by Hi-C and lamina-DamID data integration, and the bottom panel shows the fractions obtained in our control structure population generated using only Hi-C data

Fig. 3
Fig. 3

Residual polarized organization. a Projected localization probability densities (LPDs) of centromeres and peri-telomeric sequences for all chromosome arms calculated from the structure population. Probability densities are determined with respect to two principle axes of the nuclear architecture. The z-axis connects the center of the nucleolus with the origin at the nuclear center. The radial axis defines the distance of a point from the central z-axis (shown in the left panel in b). The left half of the projected localization density plot mirrors the right half for visual convenience. b The genome organization for different chromosome arms in one genome structure

Fig. 4
Fig. 4

Heterochromatin and nucleolus positions. a Left panel: Localization probability density (LPD) plots of the nucleolus and all pericentromeric heterochromatin regions in the model. On average, the nucleolus occupies an intermediate position between the center and the periphery and is surrounded by pericentromeric heterochromatin. Right panel: LPD plots for pericentromeric heterochromatin of different chromosome arms. They all exhibit different preferred locations. Those of chr4 and chrX are significantly more peripheral than those of the other chromosomes. b Clustering of pericentromeric heterochromatin regions based on their averaged surface-to-surface distances. Heterochromatin domains of arms from the same chromosome naturally show preferred clustering. Heterochromatin domains from chr4 and chrX are usually closer to each other than to those from other chromosomes. c Left panel: FISH signals in larval brain cells. The image shows the middle Z-stack of a representative nucleus. Scale bar = 1 μm. Right panel: The position of FISH probes used for this study, relative to the pericentromeric regions of each chromosome (chrX, chr2, chr4). Note that the 359bp probe signal (orange in the scheme) is rendered in white in the FISH image. d Top panel: The positions (center-to-center distance normalized by the diameter of the nucleus) of heterochromatic satellites from different chromosomes relative to each other, measured in FISH experiments on larval brains; ****p value <0.0001 by paired t-test, N = 55 cells. Bottom panel: Pairwise distances (surface-to-surface distance normalized by the diameter of the nucleus) between the heterochromatin domains as measured in the model. Similar to the data in vivo, the distance between the heterochromatin domains of chrX and chr4 is significantly smaller than the distance between the other two pairs according to paired t-tests (p value <2.2e − 16). e Left panel: Positions of heterochromatic satellites from different chromosomes relative to the nuclear periphery, obtained from FISH experiments on larval brain cells. The heterochromatic satellites on chrX and chr4 are closer to the NE than those of chr2. Right panel: The distance from the center of heterochromatin to the NE normalized by the nuclear diameter as measured in the model. The models show a very good agreement with the experiment when considering the main trends, mainly: chrX and chr4 have higher histogram peaks located closer to the NE in comparison to chr2, and show a more focused localization probability towards the nuclear envelope. Note that the physical volume of the satellite repeats (imaged by FISH) is much smaller than the physical volume of the entire heterochromatin domain represented by a relatively large sphere in the model. This difference explains the offset observed at small distance values (i.e., starting at larger values) for the histograms, which corresponds to the radii of the corresponding spheres (i.e., 0.09, 0.05, and 0.08 normalized by nuclear diameter for chrX, chr4, and chr2R, respectively). For example, if a heterochromatin sphere is touching the NE, by definition the center distance to the NE is its radius. However, the satellite repeats that would be located inside the sphere could still be close to the NE

Fig. 5
Fig. 5

Localization of euchromatin domains in the structure population. a The average radial position for each euchromatin domain, plotted by position along its chromosome. The 0 location along the x-axis (vertical dashed line) of chr2 represents the euchromatin region closest to the centromere, with 2L domains on the left and 2R domains on the right. Chr3 domains are plotted with the same coordinate system as chr2. The domains of chr4 are plotted from left to right, while the domains of chrX are plotted from right to left; this convention follows the schematics in Fig. 1. Centromeric regions and pericentromeric heterochromatin regions are not shown in this figure. The domains near pericentromeric regions are closer to the nuclear center on average, while the domains near telomeric ends are preferentially close to the nuclear periphery. b The average radial positions of each domain, grouped by epigenetic class. c Localization probability density (LPD) plots of all euchromatin domains from each chromosome arm in nuclear space. d LPD plot of all euchromatin domains

Fig. 6
Fig. 6

Analysis of homologous pairing. a Schematic view of surface-to-surface distances between homologous domains. Different domains exhibit different degrees of homolog pairing. b Left panel: Pairing frequency for each euchromatin domain, plotted by chromosome. We define a domain as being paired in a structure if the surface-to-surface distance between the two homologs is less than 200 nm. The x-axes are the same as the plots in Fig. 5a. The domains are colored by their epigenetic classes: green, HP1; blue, PcG; black, null; red, active. Right panel: Density plots of the domain pairing frequencies, grouped by epigenetic class. The active class has the smallest mean homologous pairing frequency for each chromosome. c Reproducibility of the average homolog distances between two independently generated structure populations. The Pearson’s correlation between them is 0.998, with p value <2.2e − 16. d The correlation between the pairing frequencies of homologous domains and their Mrg15 enrichment is negative. The Mrg15 scores range from 0.8 to 3.0 and are divided into 21 equal bins. The corresponding pairing frequencies from our models in a given Mrg15 bin are summarized as a mean and variance, and the latter is displayed as an error bar. The blue dashed line is the linear regression between the average pairing frequency in each bin and the midpoint Mrg15 enrichment value of the bin. The Pearson’s correlation between them is −0.81, with p value = 7.59-e − 06. e Left panel: Pairing frequencies of homologous domains grouped by epigenetic class. Right panel: Enrichments of Mrg15 binding grouped by epigenetic class. Active domains are generally more enriched with Mrg15, and have lower pairing frequencies, than the other three repressive classes

Fig. 7
Fig. 7

Transcriptional efficiency and DNA replication timing for genes in two subclasses of the active domains. a Domains in the “active-loose” subclass have lower frequencies of homolog pairing than those in the “active-tight” subclass (Additional file 1: Supplementary methods C.6). The active-tight subclass includes 71 domains and the active-loose subclass includes 423 domains. All the statistical tests were performed using one-tailed Mann–Whitney U test. Left panel: Domains in the active-tight subclass contain significantly more genes than domains in the active-loose subclass. Right panel: Genes in both subclasses have similar average expression values. b TBP (TATA binding protein), RNA polymerase II binding signal, and H3K4me2 signals are more enriched in domains of the active-loose subclass. c Formaldehyde-assisted isolation of regulatory elements (FAIRE) signal is significantly stronger in domains of the active-loose subclass. d Origin recognition complex (ORC) is significantly more enriched in domains of the active-loose subclass

Similar articles

Cited by

References

    1. Pickersgill H, Kalverda B, de Wit E, Talhout W, Fornerod M, van Steensel B. Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet. 2006;38:1005–14. doi: 10.1038/ng1852. - DOI - PubMed
    1. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W, van Steensel B. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature. 2008;453:948–51. doi: 10.1038/nature06947. - DOI - PubMed
    1. Peric-Hupkes D, Meuleman W, Pagie L, Bruggeman SW, Solovei I, Brugman W, Graf S, Flicek P, Kerkhoven RM, van Lohuizen M, et al. Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation. Mol Cell. 2010;38:603–13. doi: 10.1016/j.molcel.2010.03.016. - DOI - PMC - PubMed
    1. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–93. doi: 10.1126/science.1181369. - DOI - PMC - PubMed
    1. Kalhor R, Tjong H, Jayathilaka N, Alber F, Chen L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat Biotechnol. 2012;30:90–8. doi: 10.1038/nbt.2057. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources