pubmed.ncbi.nlm.nih.gov

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India - PubMed

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India

T J Pemberton et al. Ann Hum Genet. 2008 Jul.

Abstract

When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Linkage disequilibrium vs. physical distance. The r2 statistic was calculated for each pair of SNPs with MAF≥0.1. The mean r2 for a given distance bin is plotted as a function of the mean distance between pairs of SNPs with distance in the bin. Bin size is 6 kb. Each line represents a separate population.

Figure 2
Figure 2

The fraction of common haplotypes (≥10% frequency) in individual populations that are also common in the HapMap. For each plot we used haplotypes based on the SNPs that overlap between HapMap Phase II and our autosomal core regions, and we averaged over all windows of a given length. The graph on the right shows the fraction of the common haplotypes of a population that are also common in the most similar HapMap sample (determined point by point). Thus, for each population and each window size, the rightmost panel takes the highest value among those shown in the other three panels. The non-African populations with the lowest level of coverage by the most similar HapMap population are labeled in the rightmost panel.

Figure 3
Figure 3

Portability of tag SNPs chosen using the individual HapMap populations and optimal HapMap mixtures, for each of the 55 populations (as measured by PVT). (A) The proportion of polymorphic non-tag SNPs with MAF>0.05 in the target population that have r2≥0.85 with at least one tag SNP (PVT). PVT is plotted only for the HapMap group that produced the highest PVT. For each population, the color of the bar indicates the HapMap sample from which the optimal tag SNP set was chosen (blue=CEU, pink=CHB+JPT, orange=YRI). The vertical line indicates 50% tag portability. (B) The highest PVT obtained using tag SNP panels from HapMap mixtures. The black portion of the bar represents the increase in PVT obtained using tag SNPs from the optimal HapMap mixture compared to the most effective individual HapMap sample. (C) The proportions of the three HapMap populations in the optimal HapMap mixture that produced the highest PVT (blue=CEU, pink=CHB+JPT, orange=YRI). In the Surui and Colombian populations, multiple mixtures produced PVT values above 1, and the optimal mixture was chosen as the one with the highest unadjusted PVT (the same procedure was applied for Surui in part A).

Figure 4
Figure 4

Portability in the Tamilians and Bengalis of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. Each vertex of the triangle represents one of the three HapMap populations (CEU, CHB+JPT, YRI), with increasing distance from that vertex indicating a smaller percentage of that HapMap population present in the population mixture. The shading represents the level of portability as measured by PVT. Note that the darkest and lightest colors represent wider ranges of PVT values than the other colors. A black circle indicates the combination of the three HapMap samples that produces the highest PVT among the points tested (80% CEU, 5% CHB+JPT, 15% YRI for Tamilians; 60% CEU, 40% CHB+JPT, 0% YRI for Bengalis).

Figure 5
Figure 5

Portability in individual populations of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. The figure design follows that of Figure 4, with a different color scale.

Similar articles

Cited by

References

    1. Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yarnall DP, Briley JD, Maruyama Y, Kobayashi M, Wood NW, Spurr NK, Burns DK, Roses AD, Saunders AM, Goldstein DB. A single-nucleotide polymorphism tagging set for human drug metabolism and transport. Nat Genet. 2005;37:84–89. - PubMed
    1. Andrés AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol. 2007;31:659–671. - PMC - PubMed
    1. Beaty TH, Fallin MD, Hetmanski JB, McIntosh I, Chong SS, Ingersoll R, Sheng X, Chakraborty R, Scott AF. Haplotype diversity in 11 candidate genes across four populations. Genetics. 2005;171:259–267. - PMC - PubMed
    1. Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL. A human genome diversity cell line panel. Science. 2002;296:261–262. - PubMed
    1. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004;74:106–120. - PMC - PubMed

Publication types

MeSH terms