pubmed.ncbi.nlm.nih.gov

Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set - PubMed

Coverage and characteristics of the Affymetrix GeneChip Human Mapping 100K SNP set

Dan L Nicolae et al. PLoS Genet. 2006 May.

Abstract

Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs). In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous) and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD) coefficient based on information content (analogous to the information content scores commonly used for linkage mapping) that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Factors Explaining the 100K Set Chromosomal Density

Chromosomes mentioned in the text are labeled, as are chromosomes that have similar HapMap densities to those that are discussed. (A) Plots the density (number of SNPs/100 kbp) of the Affymetrix 100K set for the 22 autosomal chromosomes against the average number of SNPs/100 kb in the HapMap database. (B) Plots the density of the 100K set in each chromosome as a function of the percentage of HapMap SNPs in that chromosome located in exons (synonymous or nonsynonymous).

Figure 2
Figure 2. Histograms of MAF for HapMap and 100K SNP Sets
Figure 3
Figure 3. Information on HapMap Variation

(A) Shows the distribution of the measure MD for all markers in the HapMap SNP set, including the markers in the Affymetrix 100K set. The part of the rightmost bar above the horizontal line corresponds to the markers in the 100K SNP set. (B) Displays a similar histogram for the maximum r 2 value between each marker in HapMap and the 100K SNP set markers within 200 kb of them.

Figure 4
Figure 4. The Variability in Chromosomal Information

Median value of the multilocus measure of LD. MD is plotted for each chromosome for intergenic SNPs located more than 2 kb from a gene (filled circles) and for nonsynonymous SNPs within exons (open squares).

Figure 5
Figure 5. Differences in Pairwise and Multilocus LD

Summary of LD estimated with the multilocus measure, MD, and maximum of the pairwise measure, r 2, for HapMap SNPs in a 370-kb region on Chromosome 1 starting at rs666371 (20511030 bp on HapMap physical map). The filled triangles at the top indicate the location of nine SNPs that are part of the 100K set. For each HapMap SNP, the value of MD was calculated using all 100K-set SNPs within 200 kb (filled circles), and similarly we calculated the maximum value of r 2 between the HapMap SNP and the same set of the array SNPs (open circles). The vertical lines connect the values of MD and max r 2 for each marker.

Figure 6
Figure 6. Histogram Summary of Information Content as Assessed by MD in the 100K SNP Set

The value of MD was calculated for each SNP in the 100K SNP set using only the 100K SNPs from that region, but excluding the actual SNP under consideration.

Figure 7
Figure 7. The Distribution of MAFs for Low-Information SNPs

(A) Shows the histogram plots of the MAFs for all HapMap markers for which the 100K SNP set does not provide information (MD < 0. 05). (B) Shows subset of the markers described in (A) that have at least 10 SNPs from the 100K SNP set within 100 kb.

Figure 8
Figure 8. The Contrast between CEU and YRI LD Patterns

Histogram of the difference between MD values calculated in the European HapMap and African HapMap samples (MD-European − MD-African) when the Affymetrix 100K–SNP subset phased in both samples was used to provide information on the entire HapMap SNP set.

Similar articles

Cited by

References

    1. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. - PubMed
    1. Matsuzaki H, Dong S, Loi H, Di X, Liu G, et al. Genotyping over 100,000 SNPs on a pair of oligonucleotife arrays. Nat Methods. 2004;1:109–111. - PubMed
    1. Klein RJ, Zeiss C, Chew EY, Tsai JY, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. - PMC - PubMed
    1. Maraganore DM, de Andrade M, Lesnick TG, et al. High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet. 2005;77:685–693. - PMC - PubMed
    1. Altshuler D, Pollara VJ, Cowels CR, Van Etten WJ, Baldwin J, et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed

Publication types

MeSH terms

Substances