Rare-variant association testing for sequencing data with the sequence kernel association test - PubMed
- ️Sat Jan 01 2011
Rare-variant association testing for sequencing data with the sequence kernel association test
Michael C Wu et al. Am J Hum Genet. 2011.
Abstract
Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single-marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait while easily adjusting for covariates. As a score-based variance-component test, SKAT can quickly calculate p values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30 kb regions, requires only 7 hr on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample-size calculations to help design candidate-gene, whole-exome, and whole-genome sequence association studies.
Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Figures

Simulation-Study-Based Power Comparisons of SKAT and Burden Tests Empirical power at α = 10−6 under an assumption that 5% of the rare variants with MAF < 3% within random 30 kb regions were causal. Top panel: continuous phenotypes with maximum effect size (|β|) equal to 1.6 when MAF = 10−4; bottom panel: case-control studies with maximum OR = 5 when MAF = 10−4. Regression coefficients for the s causal variants were assumed to be a decreasing function of MAF as |βj|=c|log10MAFj|(j = 1,…,p [see Figure S2]), where c was chosen to result in these maximum effect sizes. From left to right, the plots consider settings in which the coefficients for the causal rare variants are 100% positive (0% negative), 80% positive (20% negative), and 50% positive (50% negative). Total sample sizes considered are 500, 1000, 2500, and 5000, with half being cases in case-control studies. For each setting, six methods are compared: SKAT, SKAT in which 10% of the genotypes were set to missing and then imputed (SKAT_M), restricted SKAT (rSKAT) in which unweighted SKAT is applied to variants with MAF < 3%, the weighted sum burden test (W) with the same weights as used by SKAT, counting-based burden test (N), and the CAST method (C). All the burden tests used MAF < 3% as the threshold. For each method, power was estimated as the proportion of p values < α among 1000 simulated data sets.

Sample Sizes Required for Reaching 80% Power Analytically estimated sample sizes required for reaching 80% power to detect rare variants associated with a continuous (top panel) or dichotomous phenotype in case-control studies (half are cases) (bottom panel) at the α = 10−6, 10−3, and 10−2 levels, under the assumption that 5% of rare variants with MAF < 3% within the 30 kb regions are causal. Plots correspond to 100%, 80%, and 50% of the causal variants associated with increase in the continuous phenotype or risk of the dichotomous phenotype. Regression coefficients for the s causal variants were assumed to be the same decreasing function of MAF as that in Figure 1. The absolute values of Required total sample sizes are plotted again the maximum effect sizes (ORs) when MAF = 10−4. Estimated total sample sizes were averaged over 100 random 30 kb regions.

Power Comparisons Based on Simulation and Analytic Estimation Power as a function of total sample size estimated by simulation with 1000 replicates and by the proposed power formula for continuous and dichotomous case-control traits. Simulation configurations correspond to those used in Figure 1, in which 80% of the regression coefficients for the causal rare variants were positive.
Similar articles
-
Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA; NHLBI GO Exome Sequencing Project—ESP Lung Project Team; Christiani DC, Wurfel MM, Lin X. Lee S, et al. Am J Hum Genet. 2012 Aug 10;91(2):224-37. doi: 10.1016/j.ajhg.2012.06.007. Epub 2012 Aug 2. Am J Hum Genet. 2012. PMID: 22863193 Free PMC article.
-
Fan R, Chiu CY, Jung J, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Chen Z, Mills JL, Xiong M. Fan R, et al. Genet Epidemiol. 2016 Dec;40(8):702-721. doi: 10.1002/gepi.21984. Epub 2016 Jul 4. Genet Epidemiol. 2016. PMID: 27374056 Free PMC article.
-
Sequence kernel association tests for the combined effect of rare and common variants.
Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin X. Ionita-Laza I, et al. Am J Hum Genet. 2013 Jun 6;92(6):841-53. doi: 10.1016/j.ajhg.2013.04.015. Epub 2013 May 16. Am J Hum Genet. 2013. PMID: 23684009 Free PMC article.
-
Generalized functional linear models for gene-based case-control association studies.
Fan R, Wang Y, Mills JL, Carter TC, Lobach I, Wilson AF, Bailey-Wilson JE, Weeks DE, Xiong M. Fan R, et al. Genet Epidemiol. 2014 Nov;38(7):622-637. doi: 10.1002/gepi.21840. Epub 2014 Sep 9. Genet Epidemiol. 2014. PMID: 25203683 Free PMC article.
-
Wu B, Guan W, Pankow JS. Wu B, et al. Ann Hum Genet. 2016 Mar;80(2):123-35. doi: 10.1111/ahg.12144. Epub 2016 Jan 12. Ann Hum Genet. 2016. PMID: 26757198 Free PMC article.
Cited by
-
Imputation-based meta-analysis of severe malaria in three African populations.
Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G, Conway DJ, Nyirongo V, Kachala D, Molyneux M, Taylor T, Ndila C, Peshu N, Marsh K, Williams TN, Alcock D, Andrews R, Edkins S, Gray E, Hubbart C, Jeffreys A, Rowlands K, Schuldt K, Clark TG, Small KS, Teo YY, Kwiatkowski DP, Rockett KA, Barrett JC, Spencer CC; Malaria Genomic Epidemiology Network. Band G, et al. PLoS Genet. 2013 May;9(5):e1003509. doi: 10.1371/journal.pgen.1003509. Epub 2013 May 23. PLoS Genet. 2013. PMID: 23717212 Free PMC article.
-
Squillario M, Abate G, Tomasi F, Tozzo V, Barla A, Uberti D; Alzheimer’s Disease Neuroimaging Initiative. Squillario M, et al. Sci Rep. 2020 Jul 21;10(1):12063. doi: 10.1038/s41598-020-67699-8. Sci Rep. 2020. PMID: 32694537 Free PMC article.
-
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
Richardson TG, Shihab HA, Rivas MA, McCarthy MI, Campbell C, Timpson NJ, Gaunt TR. Richardson TG, et al. PLoS One. 2016 Apr 29;11(4):e0153803. doi: 10.1371/journal.pone.0153803. eCollection 2016. PLoS One. 2016. PMID: 27128313 Free PMC article.
-
Knowledge-constrained K-medoids Clustering of Regulatory Rare Alleles for Burden Tests.
Sivley RM, Fish AE, Bush WS. Sivley RM, et al. Evol Comput Mach Learn Data Min Bioinform. 2013;7833:35-42. doi: 10.1007/978-3-642-37189-9_4. Evol Comput Mach Learn Data Min Bioinform. 2013. PMID: 25541630 Free PMC article.
-
Gastrointestinal stromal tumors, somatic mutations and candidate genetic risk variants.
O'Brien KM, Orlow I, Antonescu CR, Ballman K, McCall L, DeMatteo R, Engel LS. O'Brien KM, et al. PLoS One. 2013 Apr 18;8(4):e62119. doi: 10.1371/journal.pone.0062119. Print 2013. PLoS One. 2013. PMID: 23637977 Free PMC article. Clinical Trial.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials