pubmed.ncbi.nlm.nih.gov

Imputation-based analysis of association studies: candidate regions and quantitative traits - PubMed

Imputation-based analysis of association studies: candidate regions and quantitative traits

Bertrand Servin et al. PLoS Genet. 2007 Jul.

Abstract

We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Power Comparisons**
(A) single common variant, modest dominance; (B) single common variant, strong dominance for minor allele; (C) single rare variant, no dominance; (D) multiple common variants. Each colored line shows power of test varying with significance threshold (type I error). Black: BF from our method (prior D ₂); Green: *p_min* (allelic test); Red: *p_min* (genotype test); Blue: *p_reg,* multiple regression; Grey: BF_max. Each column of figures shows results for data analyzed under the “resequencing design” (left) and the “tag SNP design” (right). Each row shows results for the four different simulation scenarios.

**Figure 2. Comparison of Results for Resequencing Design (x-axis) and Tag SNP Design (y-axis)**
Panels show: (a) errors in the estimates (posterior means) of the heterozygote effect (a + d); (b) errors in the estimates (posterior means) of the main effect (a); and (c) posterior probability of being a QTN (P((*a, d*) ≠ (0, 0))) assigned to the causal variant.

**Figure 3. Examination of Potential Effect of Different Tag SNP Strategies on Power, When the Causal Variant is Rare (0.01 < MAF < 0.05)**
Solid line: Resequencing design; dashed line: tag SNP design, with tags selected using method from [19]; and dotted line: tag SNP design, with all SNPs except the causal SNP as tags.

**Figure 4. Power of the Multipoint Approach in the Rare Variant Scenario for Two Different Imputation Algorithms**

**Figure 5. Scatter Plot of Samples from Prior Distribution of a (x-axis) and a + d (y-axis), for Priors D ₁ (Black) and D ₂ (Blue)**
The solid yellow line corresponds to d = 0 (additivity). The dashed red lines are the limits above and below which a SNP exhibits over-dominance.

**Figure 6. Comparison of Inferences using Prior D ₁ and D ₂ for the BF (Left) and the Posterior Probability Assigned to the Causal Locus Being a QTN (Right)**
Results shown are for all datasets for the common variant Scenario (A) and (B) and for both the resequencing design and the tag SNP design. The discrepancy between the larger estimated BFs is caused by the fact that we used insufficient MCMC iterations to accurately estimate very large BFs (>10⁶) under prior D ₁.

**Figure 7. Illustration of How a Multi-QTN Model Can Provide Fuller Explanations Than a One-QTN Model for Observed Associations**
The figure shows, for each SNP in a dataset simulated under Scenario (D), the estimated posterior probability that it is a QTN, conditional on an association being observed. Left: Results from one-QTN model. Right: Results from multi-QTN model allowing up to four QTNs. The four actual QTNs are indicated with a star. Colors of the vertical lines indicate tag SNP “bins” (i.e., groups of SNPs tagged by the same variant).

**Figure 8. Results for the SCN1A Dataset**
Left panel shows the posterior probability assigned to each SNP being a QTN, with filled triangles denoting tag SNPs and open circles denoting non-tag SNPs. The right panel shows (in gray) estimated posterior densities of the additive effect for each of the seven SNPs assigned the highest posterior probabilities of non-zero effect (representing 90% of the posterior mass). The average of these curves is shown in black.

Cited by

Bringing genome-wide association findings into clinical use.
Manolio TA. Manolio TA. Nat Rev Genet. 2013 Aug;14(8):549-58. doi: 10.1038/nrg3523. Epub 2013 Jul 9. Nat Rev Genet. 2013. PMID: 23835440 Review.
Polygenic modeling with bayesian sparse linear mixed models.
Zhou X, Carbonetto P, Stephens M. Zhou X, et al. PLoS Genet. 2013;9(2):e1003264. doi: 10.1371/journal.pgen.1003264. Epub 2013 Feb 7. PLoS Genet. 2013. PMID: 23408905 Free PMC article.
Fast accurate missing SNP genotype local imputation.
Wang Y, Cai Z, Stothard P, Moore S, Goebel R, Wang L, Lin G. Wang Y, et al. BMC Res Notes. 2012 Aug 3;5:404. doi: 10.1186/1756-0500-5-404. BMC Res Notes. 2012. PMID: 22863359 Free PMC article.
Matrix eQTL: ultra fast eQTL analysis via large matrix operations.
Shabalin AA. Shabalin AA. Bioinformatics. 2012 May 15;28(10):1353-8. doi: 10.1093/bioinformatics/bts163. Epub 2012 Apr 6. Bioinformatics. 2012. PMID: 22492648 Free PMC article.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Chen W, Larrabee BR, Ovsyannikova IG, Kennedy RB, Haralambieva IH, Poland GA, Schaid DJ. Chen W, et al. Genetics. 2015 Jul;200(3):719-36. doi: 10.1534/genetics.115.176107. Epub 2015 May 6. Genetics. 2015. PMID: 25948564 Free PMC article.

References

1. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320. - PMC - PubMed
1. SeattleSNPs. Seattle (Washington): NHLBI Program for Genomic Applications; Available: http://pga.gs.washington.edu. Accessed 12 June 2007.
1. Kraft P, Pharoah P, Chanock SJ, Albanes D, Kolonel LN, et al. Genetic variation in the HSD17B1 gene and risk of prostate cancer. PLoS Genet. 2005;1:e68. doi: 10.1371/journal.pgen.0010068. - DOI - PMC - PubMed
1. Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2003;165:2213–2233. - PMC - PubMed
1. Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–644. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Imputation-based analysis of association studies: candidate regions and quantitative traits - PubMed