pubmed.ncbi.nlm.nih.gov

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India - PubMed

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India

T J Pemberton et al. Ann Hum Genet. 2008 Jul.

Abstract

When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.

PubMed Disclaimer

Figures

**Figure 1**
Linkage disequilibrium vs. physical distance. The r² statistic was calculated for each pair of SNPs with MAF≥0.1. The mean r² for a given distance bin is plotted as a function of the mean distance between pairs of SNPs with distance in the bin. Bin size is 6 kb. Each line represents a separate population.

**Figure 2**
The fraction of common haplotypes (≥10% frequency) in individual populations that are also common in the HapMap. For each plot we used haplotypes based on the SNPs that overlap between HapMap Phase II and our autosomal core regions, and we averaged over all windows of a given length. The graph on the right shows the fraction of the common haplotypes of a population that are also common in the most similar HapMap sample (determined point by point). Thus, for each population and each window size, the rightmost panel takes the highest value among those shown in the other three panels. The non-African populations with the lowest level of coverage by the most similar HapMap population are labeled in the rightmost panel.

**Figure 3**
Portability of tag SNPs chosen using the individual HapMap populations and optimal HapMap mixtures, for each of the 55 populations (as measured by PVT). (A) The proportion of polymorphic non-tag SNPs with MAF>0.05 in the target population that have r²≥0.85 with at least one tag SNP (PVT). PVT is plotted only for the HapMap group that produced the highest PVT. For each population, the color of the bar indicates the HapMap sample from which the optimal tag SNP set was chosen (blue=CEU, pink=CHB+JPT, orange=YRI). The vertical line indicates 50% tag portability. (B) The highest PVT obtained using tag SNP panels from HapMap mixtures. The black portion of the bar represents the increase in PVT obtained using tag SNPs from the optimal HapMap mixture compared to the most effective individual HapMap sample. (C) The proportions of the three HapMap populations in the optimal HapMap mixture that produced the highest PVT (blue=CEU, pink=CHB+JPT, orange=YRI). In the Surui and Colombian populations, multiple mixtures produced PVT values above 1, and the optimal mixture was chosen as the one with the highest unadjusted PVT (the same procedure was applied for Surui in part A).

**Figure 4**
Portability in the Tamilians and Bengalis of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. Each vertex of the triangle represents one of the three HapMap populations (CEU, CHB+JPT, YRI), with increasing distance from that vertex indicating a smaller percentage of that HapMap population present in the population mixture. The shading represents the level of portability as measured by PVT. Note that the darkest and lightest colors represent wider ranges of PVT values than the other colors. A black circle indicates the combination of the three HapMap samples that produces the highest PVT among the points tested (80% CEU, 5% CHB+JPT, 15% YRI for Tamilians; 60% CEU, 40% CHB+JPT, 0% YRI for Bengalis).

**Figure 5**
Portability in individual populations of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. The figure design follows that of Figure 4, with a different color scale.

Cited by

MaCH-admix: genotype imputation for admixed populations.
Liu EY, Li M, Wang W, Li Y. Liu EY, et al. Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16. Genet Epidemiol. 2013. PMID: 23074066 Free PMC article.
Single nucleotide polymorphisms and haplotypes in Native American populations.
Kidd JR, Friedlaender F, Pakstis AJ, Furtado M, Fang R, Wang X, Nievergelt CM, Kidd KK. Kidd JR, et al. Am J Phys Anthropol. 2011 Dec;146(4):495-502. doi: 10.1002/ajpa.21560. Epub 2011 Sep 13. Am J Phys Anthropol. 2011. PMID: 21913176 Free PMC article.
Reconstructing Indian population history.
Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reich D, et al. Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365. Nature. 2009. PMID: 19779445 Free PMC article.
Weighted likelihood inference of genomic autozygosity patterns in dense genotype data.
Blant A, Kwong M, Szpiech ZA, Pemberton TJ. Blant A, et al. BMC Genomics. 2017 Dec 1;18(1):928. doi: 10.1186/s12864-017-4312-3. BMC Genomics. 2017. PMID: 29191164 Free PMC article.
Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation.
Egyud MR, Gajdos ZK, Butler JL, Tischfield S, Le Marchand L, Kolonel LN, Haiman CA, Henderson BE, Hirschhorn JN. Egyud MR, et al. Hum Genet. 2009 Apr;125(3):295-303. doi: 10.1007/s00439-009-0627-8. Epub 2009 Jan 28. Hum Genet. 2009. PMID: 19184111 Free PMC article.

References

1. Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yarnall DP, Briley JD, Maruyama Y, Kobayashi M, Wood NW, Spurr NK, Burns DK, Roses AD, Saunders AM, Goldstein DB. A single-nucleotide polymorphism tagging set for human drug metabolism and transport. Nat Genet. 2005;37:84–89. - PubMed
1. Andrés AM, Clark AG, Shimmin L, Boerwinkle E, Sing CF, Hixson JE. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet Epidemiol. 2007;31:659–671. - PMC - PubMed
1. Beaty TH, Fallin MD, Hetmanski JB, McIntosh I, Chong SS, Ingersoll R, Sheng X, Chakraborty R, Scott AF. Haplotype diversity in 11 candidate genes across four populations. Genetics. 2005;171:259–267. - PMC - PubMed
1. Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL. A human genome diversity cell line panel. Science. 2002;296:261–262. - PubMed
1. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004;74:106–120. - PMC - PubMed

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India - PubMed