pubmed.ncbi.nlm.nih.gov

Rare-variant association testing for sequencing data with the sequence kernel association test - PubMed

️Sat Jan 01 2011

Rare-variant association testing for sequencing data with the sequence kernel association test

Michael C Wu et al. Am J Hum Genet. 2011.

Abstract

Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single-marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait while easily adjusting for covariates. As a score-based variance-component test, SKAT can quickly calculate p values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30 kb regions, requires only 7 hr on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample-size calculations to help design candidate-gene, whole-exome, and whole-genome sequence association studies.

PubMed Disclaimer

Figures

**Figure 1**
Simulation-Study-Based Power Comparisons of SKAT and Burden Tests Empirical power at α = 10⁻⁶ under an assumption that 5% of the rare variants with MAF < 3% within random 30 kb regions were causal. Top panel: continuous phenotypes with maximum effect size (|β|) equal to 1.6 when MAF = 10⁻⁴; bottom panel: case-control studies with maximum OR = 5 when MAF = 10⁻⁴. Regression coefficients for the s causal variants were assumed to be a decreasing function of MAF as |βj|=c|log10MAFj|*(j = 1,…,p* [see Figure S2]), where c was chosen to result in these maximum effect sizes. From left to right, the plots consider settings in which the coefficients for the causal rare variants are 100% positive (0% negative), 80% positive (20% negative), and 50% positive (50% negative). Total sample sizes considered are 500, 1000, 2500, and 5000, with half being cases in case-control studies. For each setting, six methods are compared: SKAT, SKAT in which 10% of the genotypes were set to missing and then imputed (SKAT_M), restricted SKAT (rSKAT) in which unweighted SKAT is applied to variants with MAF < 3%, the weighted sum burden test (W) with the same weights as used by SKAT, counting-based burden test (N), and the CAST method (C). All the burden tests used MAF < 3% as the threshold. For each method, power was estimated as the proportion of p values < α among 1000 simulated data sets.

**Figure 2**
Sample Sizes Required for Reaching 80% Power Analytically estimated sample sizes required for reaching 80% power to detect rare variants associated with a continuous (top panel) or dichotomous phenotype in case-control studies (half are cases) (bottom panel) at the α = 10⁻⁶, 10⁻³, and 10⁻² levels, under the assumption that 5% of rare variants with MAF < 3% within the 30 kb regions are causal. Plots correspond to 100%, 80%, and 50% of the causal variants associated with increase in the continuous phenotype or risk of the dichotomous phenotype. Regression coefficients for the s causal variants were assumed to be the same decreasing function of MAF as that in Figure 1. The absolute values of Required total sample sizes are plotted again the maximum effect sizes (ORs) when MAF = 10⁻⁴. Estimated total sample sizes were averaged over 100 random 30 kb regions.

**Figure 3**
Power Comparisons Based on Simulation and Analytic Estimation Power as a function of total sample size estimated by simulation with 1000 replicates and by the proposed power formula for continuous and dichotomous case-control traits. Simulation configurations correspond to those used in Figure 1, in which 80% of the regression coefficients for the causal rare variants were positive.

Cited by

Imputation-based meta-analysis of severe malaria in three African populations.
Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G, Conway DJ, Nyirongo V, Kachala D, Molyneux M, Taylor T, Ndila C, Peshu N, Marsh K, Williams TN, Alcock D, Andrews R, Edkins S, Gray E, Hubbart C, Jeffreys A, Rowlands K, Schuldt K, Clark TG, Small KS, Teo YY, Kwiatkowski DP, Rockett KA, Barrett JC, Spencer CC; Malaria Genomic Epidemiology Network. Band G, et al. PLoS Genet. 2013 May;9(5):e1003509. doi: 10.1371/journal.pgen.1003509. Epub 2013 May 23. PLoS Genet. 2013. PMID: 23717212 Free PMC article.
A telescope GWAS analysis strategy, based on SNPs-genes-pathways ensamble and on multivariate algorithms, to characterize late onset Alzheimer's disease.
Squillario M, Abate G, Tomasi F, Tozzo V, Barla A, Uberti D; Alzheimer’s Disease Neuroimaging Initiative. Squillario M, et al. Sci Rep. 2020 Jul 21;10(1):12063. doi: 10.1038/s41598-020-67699-8. Sci Rep. 2020. PMID: 32694537 Free PMC article.
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
Richardson TG, Shihab HA, Rivas MA, McCarthy MI, Campbell C, Timpson NJ, Gaunt TR. Richardson TG, et al. PLoS One. 2016 Apr 29;11(4):e0153803. doi: 10.1371/journal.pone.0153803. eCollection 2016. PLoS One. 2016. PMID: 27128313 Free PMC article.
Knowledge-constrained K-medoids Clustering of Regulatory Rare Alleles for Burden Tests.
Sivley RM, Fish AE, Bush WS. Sivley RM, et al. Evol Comput Mach Learn Data Min Bioinform. 2013;7833:35-42. doi: 10.1007/978-3-642-37189-9_4. Evol Comput Mach Learn Data Min Bioinform. 2013. PMID: 25541630 Free PMC article.
Gastrointestinal stromal tumors, somatic mutations and candidate genetic risk variants.
O'Brien KM, Orlow I, Antonescu CR, Ballman K, McCall L, DeMatteo R, Engel LS. O'Brien KM, et al. PLoS One. 2013 Apr 18;8(4):e62119. doi: 10.1371/journal.pone.0062119. Print 2013. PLoS One. 2013. PMID: 23637977 Free PMC article. Clinical Trial.

References

1. Hindorff L.A., Sethupathy P., Junkins H.A., Ramos E.M., Mehta J.P., Collins F.S., Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA. 2009;106:9362–9367. - PMC - PubMed
1. Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. - PMC - PubMed
1. Mardis E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 2008;9:387–402. - PubMed
1. Ansorge W.J. Next-generation DNA sequencing techniques. New Biotechnol. 2009;25:195–203. - PubMed
1. Eichler E.E., Flint J., Gibson G., Kong A., Leal S.M., Moore J.H., Nadeau J.H. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 2010;11:446–450. - PMC - PubMed

Rare-variant association testing for sequencing data with the sequence kernel association test - PubMed