Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India - PubMed
Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India
T J Pemberton et al. Ann Hum Genet. 2008 Jul.
Abstract
When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis - such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.
Figures

Linkage disequilibrium vs. physical distance. The r2 statistic was calculated for each pair of SNPs with MAF≥0.1. The mean r2 for a given distance bin is plotted as a function of the mean distance between pairs of SNPs with distance in the bin. Bin size is 6 kb. Each line represents a separate population.

The fraction of common haplotypes (≥10% frequency) in individual populations that are also common in the HapMap. For each plot we used haplotypes based on the SNPs that overlap between HapMap Phase II and our autosomal core regions, and we averaged over all windows of a given length. The graph on the right shows the fraction of the common haplotypes of a population that are also common in the most similar HapMap sample (determined point by point). Thus, for each population and each window size, the rightmost panel takes the highest value among those shown in the other three panels. The non-African populations with the lowest level of coverage by the most similar HapMap population are labeled in the rightmost panel.

Portability of tag SNPs chosen using the individual HapMap populations and optimal HapMap mixtures, for each of the 55 populations (as measured by PVT). (A) The proportion of polymorphic non-tag SNPs with MAF>0.05 in the target population that have r2≥0.85 with at least one tag SNP (PVT). PVT is plotted only for the HapMap group that produced the highest PVT. For each population, the color of the bar indicates the HapMap sample from which the optimal tag SNP set was chosen (blue=CEU, pink=CHB+JPT, orange=YRI). The vertical line indicates 50% tag portability. (B) The highest PVT obtained using tag SNP panels from HapMap mixtures. The black portion of the bar represents the increase in PVT obtained using tag SNPs from the optimal HapMap mixture compared to the most effective individual HapMap sample. (C) The proportions of the three HapMap populations in the optimal HapMap mixture that produced the highest PVT (blue=CEU, pink=CHB+JPT, orange=YRI). In the Surui and Colombian populations, multiple mixtures produced PVT values above 1, and the optimal mixture was chosen as the one with the highest unadjusted PVT (the same procedure was applied for Surui in part A).

Portability in the Tamilians and Bengalis of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. Each vertex of the triangle represents one of the three HapMap populations (CEU, CHB+JPT, YRI), with increasing distance from that vertex indicating a smaller percentage of that HapMap population present in the population mixture. The shading represents the level of portability as measured by PVT. Note that the darkest and lightest colors represent wider ranges of PVT values than the other colors. A black circle indicates the combination of the three HapMap samples that produces the highest PVT among the points tested (80% CEU, 5% CHB+JPT, 15% YRI for Tamilians; 60% CEU, 40% CHB+JPT, 0% YRI for Bengalis).

Portability in individual populations of tag SNPs chosen from different mixtures of HapMap populations, as measured by PVT. The figure design follows that of Figure 4, with a different color scale.
Similar articles
-
A worldwide survey of haplotype variation and linkage disequilibrium in the human genome.
Conrad DF, Jakobsson M, Coop G, Wen X, Wall JD, Rosenberg NA, Pritchard JK. Conrad DF, et al. Nat Genet. 2006 Nov;38(11):1251-60. doi: 10.1038/ng1911. Epub 2006 Oct 22. Nat Genet. 2006. PMID: 17057719
-
Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.
Teo YY, Sim X, Ong RT, Tan AK, Chen J, Tantoso E, Small KS, Ku CS, Lee EJ, Seielstad M, Chia KS. Teo YY, et al. Genome Res. 2009 Nov;19(11):2154-62. doi: 10.1101/gr.095000.109. Epub 2009 Aug 21. Genome Res. 2009. PMID: 19700652 Free PMC article.
-
Sarkar-Roy N, Mondal D, Bhattacharya P, Majumder P. Sarkar-Roy N, et al. Int J Data Min Bioinform. 2011;5(6):706-16. doi: 10.1504/ijdmb.2011.045418. Int J Data Min Bioinform. 2011. PMID: 22295752
-
[Analysis and application of SNP and haplotype in the human genome].
Li J, Pan YC, Li YX, Shi TL. Li J, et al. Yi Chuan Xue Bao. 2005 Aug;32(8):879-89. Yi Chuan Xue Bao. 2005. PMID: 16231744 Review. Chinese.
-
Barnes MR. Barnes MR. Brief Bioinform. 2006 Sep;7(3):211-24. doi: 10.1093/bib/bbl021. Epub 2006 Jul 28. Brief Bioinform. 2006. PMID: 16877472 Review.
Cited by
-
MaCH-admix: genotype imputation for admixed populations.
Liu EY, Li M, Wang W, Li Y. Liu EY, et al. Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16. Genet Epidemiol. 2013. PMID: 23074066 Free PMC article.
-
Single nucleotide polymorphisms and haplotypes in Native American populations.
Kidd JR, Friedlaender F, Pakstis AJ, Furtado M, Fang R, Wang X, Nievergelt CM, Kidd KK. Kidd JR, et al. Am J Phys Anthropol. 2011 Dec;146(4):495-502. doi: 10.1002/ajpa.21560. Epub 2011 Sep 13. Am J Phys Anthropol. 2011. PMID: 21913176 Free PMC article.
-
Reconstructing Indian population history.
Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reich D, et al. Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365. Nature. 2009. PMID: 19779445 Free PMC article.
-
Weighted likelihood inference of genomic autozygosity patterns in dense genotype data.
Blant A, Kwong M, Szpiech ZA, Pemberton TJ. Blant A, et al. BMC Genomics. 2017 Dec 1;18(1):928. doi: 10.1186/s12864-017-4312-3. BMC Genomics. 2017. PMID: 29191164 Free PMC article.
-
Egyud MR, Gajdos ZK, Butler JL, Tischfield S, Le Marchand L, Kolonel LN, Haiman CA, Henderson BE, Hirschhorn JN. Egyud MR, et al. Hum Genet. 2009 Apr;125(3):295-303. doi: 10.1007/s00439-009-0627-8. Epub 2009 Jan 28. Hum Genet. 2009. PMID: 19184111 Free PMC article.
References
-
- Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yarnall DP, Briley JD, Maruyama Y, Kobayashi M, Wood NW, Spurr NK, Burns DK, Roses AD, Saunders AM, Goldstein DB. A single-nucleotide polymorphism tagging set for human drug metabolism and transport. Nat Genet. 2005;37:84–89. - PubMed
-
- Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Friedlaender JS, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL. A human genome diversity cell line panel. Science. 2002;296:261–262. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials