pubmed.ncbi.nlm.nih.gov

ChIP-seq: advantages and challenges of a maturing technology - PubMed

  • ️Invalid Date

Review

. 2009 Oct;10(10):669-80.

doi: 10.1038/nrg2641. Epub 2009 Sep 8.

Affiliations

Review

ChIP-seq: advantages and challenges of a maturing technology

Peter J Park. Nat Rev Genet. 2009 Oct.

Abstract

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of a ChIP-Seq experiment

Specific DNA sites that interact with transcription factors or other chromatin-associated proteins as well as sites that correspond to modified chromatin can be profiled using chromatin immunoprecipitation (ChIP) followed by massively parallel sequencing. The ChIP process enriches crosslinked proteins or modified chromatin of interest using an antibody specific to the protein or the histone modification. Purified DNA can be sequenced on any of the next-generation platforms [12]. The basic concepts are similar on these platforms: common adaptors are ligated to the ChIP DNA, and clonally clustered amplicons are generated. The sequencing step involves enzyme-driven extension of all templates in parallel, alternating with detection of florescent labels incorporated with each extension by high-resolution imaging. On the Illumina/Solexa Genome Analyzer (bottom left), clusters of clonal sequences are generated by bridge PCR, and sequencing is performed by sequencing-by-synthesis. On the 454 and SOLiD platforms (bottom middle), clonal sequencing features are generated by emulsion PCR, with amplicons captured to the surface of μm-scale beads. Beads with amplicons are then recovered and immobilized to a planar substrate to be sequenced by pyrosequencing (454) or by DNA ligase-driven synthesis (SOLiD). On single-molecular sequencing platforms such as the HeliScope by Helicos (bottom right), fluorescent nucleotides incorporated into templates can be imaged at the level of single molecules, thus making clonal amplification unnecessary.

Figure 2
Figure 2. ChIP profiles

A) An example of ChIP-Seq and ChIP-chip profiles. The figure shows a section of the binding profiles of the chromodomain protein Chromator measured by ChIP-chip (unlogged intensity ratio, blue) and ChIP-Seq (tag density, red) in the D. melanogaster S2 cell line. The tag density profile obtained by ChIP-Seq reveals specific positions of Chromator binding with higher spatial resolution and sensitivity. The ChIP-Seq input DNA (control experiment) tag density is shown (gray) for comparison. B) Examples of different types of ChIP-Seq tag density profiles. Profiles for different types of proteins and histone marks can have different types of features. For example: sharp binding sites, as shown for the insulator binding protein CTCF (red); a mixture of shapes, as shown for RNA Polymerase II (orange), which has a sharp peak followed by a broad region of enrichment; medium size broad peaks, as illustrated by H3K36me3 (green), which is associated with transcription elongation over the gene body; and large domains, as illustrated by H3K27me3 (blue), a repressive markindicative of Polycomb-mediated silencing. Data for part B are from human T-cells, from REF .

Figure 3
Figure 3. Depth of Sequencing

(A) To determine whether enough tags have been sequenced, simulation can be carried out to characterize the fraction of the peaks that would be recovered if a smaller number of tags had been sequenced. In many cases, new statistically significant peaks are discovered at a steady rate with an increasing number of tags (solid curve), i.e., there is no saturation of binding sites. However, when a minimum threshold is imposed for the enrichment ratio between ChIP and input DNA peaks, the rate at which new peaks are discovered slows down (dashed curve). That is, saturation of detected binding sites can occur when sufficiently prominent binding positions are considered. For a given data set, multiple curves corresponding to different thresholds can be examined to identify the threshold at which the curve becomes sufficiently flat to meet the desired saturation criteria (upper right box defined by the red lines). We refer to such threshold as the Minimum Saturation Enrichment Ratio (MSER). MSER can serve as a measure for the depth of sequencing achieved in a data set: A high MSER, for example, indicates that the data set may be under-sampled, as only the more prominent peaks were saturated. See REF Kharchenko et al for details. (B) There are two ways in which a peak can be more statistically significant than another (lower panels compared to upper panels): higher enrichment ratio in ChIP compared to control for the same number of tags (shown under the curve in each case) (lower left) or the same enrichment ratio but a larger number of tag counts (lower right). As the latter case illustrates, there may not be saturation of binding sites when more sequencing leads to less prominent peaks becoming more statistically significant.

Figure 4
Figure 4. Overview of ChIP-Seq analysis

The raw data for ChIP-Seq analysis are images from the next generation sequencing platform (top left). A base-caller converts the image data to sequence tags, which are then aligned to the genome, on some platforms with the aid of quality scores that indicate the reliability of each base call. Peak calling using the ChIP and a control profile (usually input DNA) are used to generate a list of enriched regions ordered by false discovery rate as a statistical measure. Subsequently, the profiles of enriched regions are viewed with a browser and a variety of advanced analyses are performed.

Figure 5
Figure 5. Strand-specific profile at enriched sites

DNA fragments from a chromatin immunoprecipitation experiment are sequenced from the 5′ end. Thus, alignment of these tags to the genome results in two peaks, one on each strand, flanking the location where the protein or nucleosome of interest was bound. This strand-specific pattern can be used for optimal detection of enriched regions. To approximate the distribution of all fragments, each tag location can be extended by an estimated fragment size in the appropriate orientation and the number of fragments is counted

Similar articles

Cited by

References

    1. Farnham PJ. Insights from genomic profiling of transcription factors. Nat Rev Genet. 2009 in press. - PMC - PubMed
    1. Jiang C, Pugh BF. Nucleosome positioning and gene regulation: advances through genomics. Nat Rev Genet. 2009;10(3):161–72. - PMC - PubMed
    1. Henikoff S. Nucleosome destabilization in the epigenetic regulation of gene expression. Nat Rev Genet. 2008;9(1):15–26. - PubMed
    1. Li B, et al. The role of chromatin during transcription. Cell. 2007;128(4):707–19. - PubMed
    1. Allis CD, et al., editors. Epigenetics. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, New York: 2007.

Publication types

MeSH terms

Substances