Imputation-based analysis of association studies: candidate regions and quantitative traits - PubMed
Imputation-based analysis of association studies: candidate regions and quantitative traits
Bertrand Servin et al. PLoS Genet. 2007 Jul.
Abstract
We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Conflict of interest statement
Competing interests. The authors have declared that no competing interests exist.
Figures

(A) single common variant, modest dominance; (B) single common variant, strong dominance for minor allele; (C) single rare variant, no dominance; (D) multiple common variants. Each colored line shows power of test varying with significance threshold (type I error). Black: BF from our method (prior D 2); Green: pmin (allelic test); Red: pmin (genotype test); Blue: preg, multiple regression; Grey: BFmax. Each column of figures shows results for data analyzed under the “resequencing design” (left) and the “tag SNP design” (right). Each row shows results for the four different simulation scenarios.

Panels show: (a) errors in the estimates (posterior means) of the heterozygote effect (a + d); (b) errors in the estimates (posterior means) of the main effect (a); and (c) posterior probability of being a QTN (P((a, d) ≠ (0, 0))) assigned to the causal variant.

Solid line: Resequencing design; dashed line: tag SNP design, with tags selected using method from [19]; and dotted line: tag SNP design, with all SNPs except the causal SNP as tags.


The solid yellow line corresponds to d = 0 (additivity). The dashed red lines are the limits above and below which a SNP exhibits over-dominance.

Results shown are for all datasets for the common variant Scenario (A) and (B) and for both the resequencing design and the tag SNP design. The discrepancy between the larger estimated BFs is caused by the fact that we used insufficient MCMC iterations to accurately estimate very large BFs (>106) under prior D 1.

The figure shows, for each SNP in a dataset simulated under Scenario (D), the estimated posterior probability that it is a QTN, conditional on an association being observed. Left: Results from one-QTN model. Right: Results from multi-QTN model allowing up to four QTNs. The four actual QTNs are indicated with a star. Colors of the vertical lines indicate tag SNP “bins” (i.e., groups of SNPs tagged by the same variant).

Left panel shows the posterior probability assigned to each SNP being a QTN, with filled triangles denoting tag SNPs and open circles denoting non-tag SNPs. The right panel shows (in gray) estimated posterior densities of the additive effect for each of the seven SNPs assigned the highest posterior probabilities of non-zero effect (representing 90% of the posterior mass). The average of these curves is shown in black.
Similar articles
-
Practical issues in imputation-based association mapping.
Guan Y, Stephens M. Guan Y, et al. PLoS Genet. 2008 Dec;4(12):e1000279. doi: 10.1371/journal.pgen.1000279. Epub 2008 Dec 5. PLoS Genet. 2008. PMID: 19057666 Free PMC article.
-
MacLeod IM, Bowman PJ, Vander Jagt CJ, Haile-Mariam M, Kemper KE, Chamberlain AJ, Schrooten C, Hayes BJ, Goddard ME. MacLeod IM, et al. BMC Genomics. 2016 Feb 27;17:144. doi: 10.1186/s12864-016-2443-6. BMC Genomics. 2016. PMID: 26920147 Free PMC article.
-
Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data.
Xu Z, Kaplan NL, Taylor JA. Xu Z, et al. Eur J Hum Genet. 2007 Oct;15(10):1063-70. doi: 10.1038/sj.ejhg.5201875. Epub 2007 Jun 13. Eur J Hum Genet. 2007. PMID: 17568388
-
MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression.
He J, Zelikovsky A. He J, et al. Bioinformatics. 2006 Oct 15;22(20):2558-61. doi: 10.1093/bioinformatics/btl420. Epub 2006 Aug 7. Bioinformatics. 2006. PMID: 16895924
-
Tag SNP selection for association studies.
Stram DO. Stram DO. Genet Epidemiol. 2004 Dec;27(4):365-74. doi: 10.1002/gepi.20028. Genet Epidemiol. 2004. PMID: 15372618 Review.
Cited by
-
Bringing genome-wide association findings into clinical use.
Manolio TA. Manolio TA. Nat Rev Genet. 2013 Aug;14(8):549-58. doi: 10.1038/nrg3523. Epub 2013 Jul 9. Nat Rev Genet. 2013. PMID: 23835440 Review.
-
Polygenic modeling with bayesian sparse linear mixed models.
Zhou X, Carbonetto P, Stephens M. Zhou X, et al. PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7. PLoS Genet. 2013. PMID: 23408905 Free PMC article.
-
Fast accurate missing SNP genotype local imputation.
Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G. Wang Y, et al. BMC Res Notes. 2012 Aug 3;5:404. doi: 10.1186/1756-0500-5-404. BMC Res Notes. 2012. PMID: 22863359 Free PMC article.
-
Matrix eQTL: ultra fast eQTL analysis via large matrix operations.
Shabalin AA. Shabalin AA. Bioinformatics. 2012 May 15;28(10):1353-8. doi: 10.1093/bioinformatics/bts163. Epub 2012 Apr 6. Bioinformatics. 2012. PMID: 22492648 Free PMC article.
-
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Chen W, Larrabee BR, Ovsyannikova IG, Kennedy RB, Haralambieva IH, Poland GA, Schaid DJ. Chen W, et al. Genetics. 2015 Jul;200(3):719-36. doi: 10.1534/genetics.115.176107. Epub 2015 May 6. Genetics. 2015. PMID: 25948564 Free PMC article.
References
-
- SeattleSNPs. Seattle (Washington): NHLBI Program for Genomic Applications; Available: http://pga.gs.washington.edu. Accessed 12 June 2007.
-
- Kraft P, Pharoah P, Chanock SJ, Albanes D, Kolonel LN, et al. Genetic variation in the HSD17B1 gene and risk of prostate cancer. PLoS Genet. 2005;1:e68. doi: 10.1371/journal.pgen.0010068. - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous