pubmed.ncbi.nlm.nih.gov

Replicability and Prediction: Lessons and Challenges from GWAS - PubMed

Review

Replicability and Prediction: Lessons and Challenges from GWAS

Urko M Marigorta et al. Trends Genet. 2018 Jul.

Abstract

Since the publication of the Wellcome Trust Case Control Consortium (WTCCC) landmark study a decade ago, genome-wide association studies (GWAS) have led to the discovery of thousands of risk variants involved in disease etiology. This success story has two angles that are often overlooked. First, GWAS findings are highly replicable. This is an unprecedented phenomenon in complex trait genetics, and indeed in many areas of science, which in past decades have been plagued by false positives. At a time of increasing concerns about the lack of reproducibility, we examine the biological and methodological reasons that account for the replicability of GWAS and identify the challenges ahead. In contrast to the exemplary success of disease gene discovery, at present GWAS findings are not useful for predicting phenotypes. We close with an overview of the prospects for individualized prediction of disease risk and its foreseeable impact in clinical practice.

Keywords: GWAS; genetic architecture; genetic risk score; prediction; replicability.

Copyright © 2018 Elsevier Ltd. All rights reserved.

PubMed Disclaimer

Figures

Figure 1, Key Figure
Figure 1, Key Figure. An increasing proportion of GWAS findings correspond to replications of SNP-trait associations that had been previously reported

The graph reports the number of discoveries of new loci and re-discovery of previously discovered loci for 60 diseases included in the GWAS Catalog (quantitative traits were not considered; last accessed 17th March 2017). Data are classified according to the semester of publication (from 2005 to 2016). Given that all records included in the Catalog achieve at least P<10−5, all newly included SNP-disease pairs that had already been included because of being discovered by previous GWAS can be considered as replications of the first finding. We labelled these instances as re-discoveries, or replica. The re-discovery figure for a given semester corresponds to the total cumulative number since 2005, separated according to whether the re-discovery event constitutes the first evidence for replication of a given locus (‘first replica’) or if it had been previously replicated (‘repeated replica’). Given the diversity of arrays before the recent generalization of imputation in GWAS, SNPs are considered replicated either when the same SNP is re-discovered or another SNP in R2≥ 0.8 (using 1KG Europeans) within a +/− 500 kb window is reported. Only GWAS performed in Eurasian populations were evaluated.

Figure 2
Figure 2. Risk variants discovered by GWAS replicate similarly regardless of the Odds Ratio (OR)

X-axis: Classified by disease, percentage of SNPs with a discovery OR between 1.2 and 1.5 that have been re-discovered at least once and included in the GWAS Catalog (filled circles) or that achieve nominal evidence of association (P<0.05) in a large meta-analysis published after the discovery GWAS (inverted triangles). Y-axis: for the same diseases, proportion of re-discovery for risk SNPs with discovery OR<1.2. Although the percentages vary by disease, the average re-discovery of low-effect variants is not significantly lower than that of variants with larger effect (P=0.93, Kolmogorov-Smirnov test) and overall re-discovery estimates are highly correlated (Spearman’s rho: 0.54, P=10−16).

Figure 3
Figure 3. Increased availability of known risk variants and new methodologies are improving the ability to classify individuals according to disease

We calculated the area under the ROC curve (AUC) for five diseases using the samples from the WTCCC paper and SNPs discovered at different year ranges. X-axis: Discovery year of disease-associated variants included in the genetic risk score for each disease (based on year of inclusion in the GWAS Catalog). Y-axis: AUC using samples from the WTCCC case-control study. Performance of two predictors is shown, namely, a standard GRS (dashed lines) and a modified version of the GRS using the number of risk alleles times the weight for each SNP, as derived from the COMBI algorithm (solid lines). The number of SNPs used in each AUC calculation are indicated in the small numbers next to each point. Even if the number of risk SNPs has increased steadily, for both methodologies we observe a plateau in the predictive power of the genetic risk scores.

Figure 4
Figure 4. Polygenic Risk Scores (PRS) stratify individuals according to risk of disease

X-axis: 55,210 samples of European ancestry from the Kaiser GERA Cohort are classified according to deciles of PRS for type 2 diabetes (T2D). Y-axis: Percentage of patients in each category. The graph shows the impact of increasing deciles of a weighted genetic risk score based on 414 LD-pruned SNPs associated with T2D (P<10−3 in the 2014 trans-ethnic DIAGRAM GWAS). The PRS captures risk of disease according to genetic makeup of individuals, with a 2-fold enrichment of cases in the top vs. lowest decile.

Similar articles

Cited by

References

    1. Visscher PM, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101(1):5–22. - PMC - PubMed
    1. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456(7218):18–21. - PubMed
    1. Wellcome Trust Case Control C. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–78. - PMC - PubMed
    1. Boyle EA, et al. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169(7):1177–1186. - PMC - PubMed
    1. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124. - PMC - PubMed

Publication types

MeSH terms