Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes - PubMed
- ️Sun Jan 01 2017
Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes
Wanding Zhou et al. Nucleic Acids Res. 2017.
Abstract
Illumina Infinium DNA Methylation BeadChips represent the most widely used genome-scale DNA methylation assays. Existing strategies for masking Infinium probes overlapping repeats or single nucleotide polymorphisms (SNPs) are based largely on ad hoc assumptions and subjective criteria. In addition, the recently introduced MethylationEPIC (EPIC) array expands on the utility of this platform, but has not yet been well characterized. We present in this paper an extensive characterization of probes on the EPIC and HM450 microarrays, including mappability to the latest genome build, genomic copy number of the 3΄ nested subsequence and influence of polymorphisms including a previously unrecognized color channel switch for Type I probes. We show empirical evidence for exclusion criteria for underperforming probes, providing a sounder basis than current ad hoc criteria for exclusion. In addition, we describe novel probe uses, exemplified by the addition of a total of 1052 SNP probes to the existing 59 explicit SNP probes on the EPIC array and the use of these probes to predict ethnicity. Finally, we present an innovative out-of-band color channel application for the dual use of 62 371 probes as internal bisulfite conversion controls.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures

Influence of SNPs on probe performance. (A) Illustration of the Infinium probe design. Type I probes utilize a pair of methylated (M) and unmethylated (U) probes, designed against the methylated and unmethylated versions of the target DNA, respectively. Signals representing these two alleles are measured in the same color channel, determined by the base incorporated (nucleotides A and T are labeled red, and C labeled green) complementary to the extension base (D = A, T, G in IUPAC code) (top). Type II probes uses a single probe and the extension occurs at the target CpG, with a red-labeled A measuring the unmethylated allele and green-labeled G measuring the methylated allele (middle). For Type I probes, color-chanel-switching (CCS) SNPs at the extension base can cause signals to come from the alternative color channel (bottom). Green and red signals reflect different alleles of the SNP. (B) Inter-individual variation (calculated as standard deviations; SDs) in beta values measured by Type I probes associated with a SNP located within the probe at a given distance from the 3΄-end of the probe (the target CpG). Normal samples (n = 705) studied in the TCGA project were used to calculate the variation. SD was first calculated within each tissue type (to avoid variance introduced by tissue-specific methylation) and averaged over 13 tissue types; (C) The averaged SD for beta values measured by Type II probes with a SNP present at a given distance from the 3΄ end of the probe, measured in the normal samples studied in the TCGA project; (D) Variation in beta values for Type I probes with CCS SNPs (red), non-channel-switching SNPs (blue) and beta values rescue for CCS SNPs by combining two color channels (green, see ‘Materials and Methods’ section), stratified by minor allele frequency.

(A) Distribution of 8620 TCGA samples on the first and the second principal components identified from beta values measured by explicit SNP probes (rs probes designated by the manufacturer). Samples are colored by self-reported ethnicity; (B) Distribution of 8620 TCGA samples on the first and the second principal components identified from allele frequency recovered from CCS SNP probes; (C) Concordance between self-reported ethnicity and predicted ethnicity using explicit SNP probes on the test dataset (n = 1103, methods); (D) Concordance between self-reported ethnicity and predicted ethnicity using the recovered CCS SNP probes on the test dataset.

(A) Box plots showing copy numbers of 3΄ nested subsequence of the EPIC probes in the bisulfite-converted genome, with varying length of the 3΄ subsequence (x axis). (B) The fraction of probes with a unique 3΄ subsequence at a given length from all the probes; (C) Total signal intensities (sum of methylated and unmethylated alleles; y axis) for uniquely mappable Type I probes (see Supplementary Data for Type II probes) with varying copy numbers of the 3΄ subsequence (x axis) of 15, 30 and 40 bases long respectively. The signal intensities are measured in a normal colon sample; (D) Linear regression lines showing the dependence of total signal intensities on the genomic copy number of 3΄-subsequences of different lengths, for Type I probes (left panel) and Type II probes (right panel); (E) Association between the copy number of 3΄ subsequence and measurement accuracy. Averaged absolute differences in beta value measurement between HM450 and WGBS measurements for the same set of samples (n = 18) is plotted against different ranges of the copy number of the 3΄ subsequence of lengths 22, 30 and 40 bases.

(A) Illustration of the use of CpC and TpC probes as bisulfite conversion controls. Incomplete conversion at C extension sites (CpC probes, 5΄ CYG in template DNA, left) leads to hybridization of a green-fluorescent G with the retained C. Successful bisulfite conversion for CpC probes should be equivalent to a TpC probe (5΄ TYG in template DNA), and lead to hybridization of a red-fluorescent A with T (right panel). In contrast, probes with a T reference allele at the extension site (TpC probes) should not have green signals in the absence of SNPs, and any green signal for these probes should reflect background. (B) Mean green signal across all 46 733 probes for CpC (y axis) versus TpC probes (x axis) in 8652 TCGA samples; (C) Mean CpH beta values vs Green CpC to TpC (GCT) score for TCGA samples (n = 8652). Testicular Germ Cell Tumors (TGCT) within the TCGA datasets are colored by histology, while other tumors are black. Overall Pearson's correlation coefficient = 0.2, P < 1e-10; (D) Top left: density plot of CpG beta value distribution for cell line replicate with the highest GCT score (green) and the lowest GCT score (red); Top right: density plot of CpG beta value distribution for ten samples in Lung adenocarcinoma (LUAD) with the highest GCT score (green) and ten LUAD samples with the lowest GCT score. The same is repeated for Prostate Adenocarcinoma (PRAD, bottom left) and Bladder Urothelial Carcinoma (BLCA, bottom right).

(A) Numbers of probes in each masking category, including previous practices that we do not recommend. Black boxes suggest the enclosed masking is recommended; (B) Circos plot of the distribution of (i) number of EPIC probes in each 1Mb bin; (ii) number of NCBI RefGene; (iii) number of masks because of mapping issues; (iv) number of probes masked because of non-unique 30-base or longer 3΄ subsequence; (v) number of masks for SNPs using global allele frequency from the entire 1000 Genome Project population. (C) The resulting population-specific masking with merged population-specific SNP masking and other recommended masking (shown in panel A).
Similar articles
-
Usability of human Infinium MethylationEPIC BeadChip for mouse DNA methylation studies.
Needhamsen M, Ewing E, Lund H, Gomez-Cabrero D, Harris RA, Kular L, Jagodic M. Needhamsen M, et al. BMC Bioinformatics. 2017 Nov 15;18(1):486. doi: 10.1186/s12859-017-1870-y. BMC Bioinformatics. 2017. PMID: 29141580 Free PMC article.
-
Daca-Roszak P, Pfeifer A, Żebracka-Gala J, Rusinek D, Szybińska A, Jarząb B, Witt M, Ziętkiewicz E. Daca-Roszak P, et al. BMC Genomics. 2015 Nov 25;16:1003. doi: 10.1186/s12864-015-2202-0. BMC Genomics. 2015. PMID: 26607064 Free PMC article.
-
Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, Van Djik S, Muhlhausler B, Stirzaker C, Clark SJ. Pidsley R, et al. Genome Biol. 2016 Oct 7;17(1):208. doi: 10.1186/s13059-016-1066-1. Genome Biol. 2016. PMID: 27717381 Free PMC article.
-
A comprehensive overview of Infinium HumanMethylation450 data processing.
Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F. Dedeurwaerder S, et al. Brief Bioinform. 2014 Nov;15(6):929-41. doi: 10.1093/bib/bbt054. Epub 2013 Aug 29. Brief Bioinform. 2014. PMID: 23990268 Free PMC article. Review.
-
Single-nucleotide polymorphism masking.
Walter NA, McWeeney SK, Peters ST, Belknap JK, Hitzemann R, Buck KJ. Walter NA, et al. Alcohol Res Health. 2008;31(3):270-1. Alcohol Res Health. 2008. PMID: 23584875 Free PMC article. Review.
Cited by
-
Epigenetic signatures of asthma: a comprehensive study of DNA methylation and clinical markers.
Van Asselt AJ, Beck JJ, Finnicum CT, Johnson BN, Kallsen N, Viet S, Huizenga P, Ligthart L, Hottenga JJ, Pool R, der Zee AHM, Vijverberg SJ, de Geus E, Boomsma DI, Ehli EA, van Dongen J. Van Asselt AJ, et al. Clin Epigenetics. 2024 Nov 2;16(1):151. doi: 10.1186/s13148-024-01765-0. Clin Epigenetics. 2024. PMID: 39488688 Free PMC article.
-
Divergent epigenetic responses to perinatal asphyxia in severe mental disorders.
Wortinger LA, Stavrum AK, Shadrin AA, Szabo A, Rukke SH, Nerland S, Smelror RE, Jørgensen KN, Barth C, Andreou D, Weibell MA, Djurovic S, Andreassen OA, Thoresen M, Ursini G, Agartz I, Le Hellard S. Wortinger LA, et al. Transl Psychiatry. 2024 Jan 8;14(1):16. doi: 10.1038/s41398-023-02709-7. Transl Psychiatry. 2024. PMID: 38191519 Free PMC article.
-
Wojewodzic MW, Lavender JP. Wojewodzic MW, et al. PLoS One. 2024 Sep 6;19(9):e0307912. doi: 10.1371/journal.pone.0307912. eCollection 2024. PLoS One. 2024. PMID: 39240881 Free PMC article.
-
Sehovic E, Zellers SM, Youssef MK, Heikkinen A, Kaprio J, Ollikainen M. Sehovic E, et al. Clin Epigenetics. 2023 Nov 10;15(1):181. doi: 10.1186/s13148-023-01594-7. Clin Epigenetics. 2023. PMID: 37950287 Free PMC article.
-
Cell division drives DNA methylation loss in late-replicating domains in primary human cells.
Endicott JL, Nolte PA, Shen H, Laird PW. Endicott JL, et al. Nat Commun. 2022 Nov 8;13(1):6659. doi: 10.1038/s41467-022-34268-8. Nat Commun. 2022. PMID: 36347867 Free PMC article.
References
-
- Bibikova M., Barnes B., Tsan C., Ho V., Klotzle B., Le J.M., Delano D., Zhang L., Schroth G.P., Gunderson K.L. et al. . High density DNA methylation array with single CpG site resolution. Genomics. 2011; 98:288–295. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases