Rare variant contribution to human disease in 281,104 UK Biobank exomes - PubMed
. 2021 Sep;597(7877):527-532.
doi: 10.1038/s41586-021-03855-y. Epub 2021 Aug 10.
Ryan S Dhindsa # 1 , Keren Carss # 2 , Andrew R Harper 2 , Abhishek Nag 2 , Ioanna Tachmazidou 2 , Dimitrios Vitsios 2 , Sri V V Deevi 2 , Alex Mackay 3 , Daniel Muthas 3 , Michael Hühn 3 , Susan Monkley 3 , Henric Olsson 3 ; AstraZeneca Genomics Initiative; Sebastian Wasilewski 2 , Katherine R Smith 2 , Ruth March 4 , Adam Platt 5 , Carolina Haefliger 2 , Slavé Petrovski 6 7 8
Collaborators, Affiliations
- PMID: 34375979
- PMCID: PMC8458098
- DOI: 10.1038/s41586-021-03855-y
Rare variant contribution to human disease in 281,104 UK Biobank exomes
Quanli Wang et al. Nature. 2021 Sep.
Abstract
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).
© 2021. The Author(s), under exclusive licence to Springer Nature Limited.
Conflict of interest statement
Q.W., R.S.D., K.C., A.R.H., A.N., I.T., D.V., S.V.V.D., A.M., D.M., M.H., S.M., H.O., S.W., K.R.S., R.M., A.P., C.H. and S.P are current employees and/or stockholders of AstraZeneca.
Figures

a, The number of genes (y axis) with at least the number of PTV carriers (x axis) in 287,917 UKB participants of any ancestry. The dashed line corresponds to the minimum number of carriers typically required to detect individual PTVs with a MAF > 0.5%, that is, 2,873 carriers. Colours represent heterozygous (het.), putative compound heterozygous (comp. het.) and homozygous/hemizygous carriers (recessive). b, The MAF distribution of 632 genome-wide significant ExWAS variants associated with binary traits. The inset plot represents the same data limited to variants with MAF < 0.5%. c, The distribution of effect sizes for 509 common versus 123 rare (MAF < 0.5%) significant ExWAS variants. The plots in b and c include variants with the largest effect sizes achieved per gene. d, Percentage of ExWAS study-wide significant PTVs (n = 24) and missense variants (n = 326) that reflect known or novel gene–phenotype relationships. Variants capturing known gene–phenotype relationships were partitioned into those validated in (1) at least one but not all, or (2) all four publicly available databases: FinnGen release r5, OMIM, the GWAS Catalog (including GWAS Catalog variants within a 50-kb flanking sequence either side of the index variant), and the ClinVar pathogenic/likely pathogenic variant collection.

a, Gene–phenotype associations for binary traits. For gene–phenotype associations that appear in multiple collapsing models, we display only the association with the strongest effect size. The dashed line represents the genome-wide significant P value threshold (2 × 10−9). The y axis is capped at −log10(P) = 50 and only associations with P < 10−5 were plotted (n = 94,208). b, Enrichment of FDA-approved drug targets, among significant binary traits, quantitative traits, OMIM genes and GWAS signals. P values were generated via two-sided Fisher’s exact test (*P < 10−5, **P < 10−20, ***P < 10−70). Exact statistics: binary odds ratio (OR) = 7.38, 95% CI: 3.71–13.59, P = 1.5 × 10−7; quantitative OR = 3.71, 95% CI: 2.28–5.76, P = 4.5 × 10−7; OMIM OR = 5.95, 95% CI: 4.90–7.23, P = 1.1 × 10−75; GWAS OR = 2.68, 95% CI: 2.12–3.32, P = 3.6 × 10−23). Error bars represent 95% CIs. Contingency tables were created using each of the binary (n = 195), quantitative (n = 395), OMIM (n = 3,875) and GWAS (n = 10,692) categories, alongside approved targets from Informa Pharmaprojects (n = 463). P values were generated via a two-tailed Fisher’s exact test. c, Effect sizes for select gene associations per disease area. Genes with the highest OR for a chapter or with OR > 100 are labelled. d, Illustration of large effect gene–phenotype associations for select disease-related quantitative traits. FEV1/FVC, forced expiratory volume in 1 s/forced vital capacity ratio; HDL, high-density lipoprotein; LDL, low-density lipoprotein. Dashed line corresponds to a beta of 0.

a, b, The change in Phred scores between the pan-ancestry and European-only analyses for 46,769 binary associations (a) and 39,541 quantitative associations (b) stratified by chapter. For gene–phenotype associations that appear in multiple collapsing models, we display only those with the lowest P value. The green dots indicate associations that were not significant in the European analysis but were significant in the combined analysis. The orange dots represent associations that were originally significant in the European-only analysis but became not significant in the combined analysis. In both figures, the y axis is capped at ΔPhred = 40 (equivalent to a P value change of 0.0001).

a, The percentage of binary union traits assessed in the cohort per disease chapter. b, The percentage of quantitative traits assessed in the cohort per chapter. c, The median number of cases of European ancestry per binary union phenotype stratified by chapter with interquartile range depicted. The median number of European cases per binary union phenotype was 191 (interquartile range: 72-773). d, The median number of participants of European ancestry tested for quantitative traits stratified by chapter with interquartile ranges depicted. The median number of individuals tested for quantitative traits was 13,782 (interquartile range: 13,780-17,795). e, Histogram depicting the number of binary union phenotypes per patient. The x-axis was capped at 200 for visual clarity. The median number of binary union traits per European participant was 25 (interquartile range: 12-45) of a possible 4,911. f, The distribution of represented genetic ancestries in the sequenced cohort. EUR = European, SAS = South Asian, AFR = African, EAS = East Asian, AMR = American. g, The distribution of the number of rare (MAF <0.005%) qualifying variants (QVs) in OMIM-derived Mendelian disease genes per ancestral group. Error bars in (c, d) represent the interquartile range.

a, The number of genes (y-axis) with at least N rare (MAF >0.01) protein-truncating variant (PTV) carriers (x-axis) in the cohort. Colours correspond to heterozygous (Het), putative compound heterozygous plus homozygous/hemizygous carriers (comp. het), and exclusively homozygous/hemizygous carriers (recessive). b, Distribution of the directions of effect for rare (MAF <0.1%) non-synonymous variant associations with quantitative phenotypes. Only phenotypes with at least five significant non-synonymous variant associations (P ≤ 2 × 10−9) in a given gene were considered.

Plot depicting significant gene-phenotype associations for quantitative traits. For gene–phenotype associations that appear in multiple collapsing models, we display only the association with the strongest effect size. The dashed line represents the genome-wide significant p-value threshold (2 × 10−9). The plot is capped at -log10(P) = 50 and only associations with P < 10-5 are included (n = 22,549).

Forest plots demonstrating enrichment of drug targets curated in DrugBank and the Informa Pharmaprojects databases among significant (Tier 1) and nearly significant (Tier 2) binary trait associations, quantitative trait associations, OMIM genes, and GWAS signals. P-values were calculated via Fisher’s exact test (two-sided). Error bars represent 95% confidence intervals of the Odds Ratio. The total numbers of genes per category are as follows: DrugBank-derived (n = 386); Approved from Informa Pharmaprojects (n = 463); Phase III from Informa Pharmaprojects (n = 474); Phase II from Informa Pharmaprojects (n = 1006); Phase I from Informa Pharmaprojects (n = 921); Collapsing – Binary (Tier 1 n = 82; Tier 2 n = 113); Collapsing - Quantitative (Tier 1 n = 269; Tier 2 n = 126); OMIM (n = 3875); GWAS (Tier 1 n = 8975; Tier 2 n = 1717).

a, Distribution of lambda (inflation factor) values across all collapsing models for binary and quantitative traits. b, Venn diagram for gene-trait associations identified by three studies using the first tranche of 50K UKB. There are 81 distinct significant gene-trait associations (P < 3.4x10−10) found among phenotypes that were studied by the three efforts (Supplementary Table 28). c, Percentage of suggestive binary gene-phenotype associations that became significant (sig) (P < 2x10−9), non-significant (non-sig) (P > 1x10−7) or remained suggestive (sugg) (2x10−9 < P < 1x10−7) with each successive UKB tranche release for binary traits (supplementary methods). 300Kv1 includes phenotypic data released up to April 2017, and 300Kv2 includes additional phenotypic data for the same set of samples released up to July 2020.

a, b, Distribution of the change between Phred ((-10*log10[p-values]) scores from the pan-ancestry collapsing analysis and the European-only collapsing analysis for binary traits (a) and quantitative traits (b). The x-axis in both figures are capped at -50 and +50.
Similar articles
-
Exome sequencing and analysis of 454,787 UK Biobank participants.
Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, Benner C, Liu D, Locke AE, Balasubramanian S, Yadav A, Banerjee N, Gillies CE, Damask A, Liu S, Bai X, Hawes A, Maxwell E, Gurski L, Watanabe K, Kosmicki JA, Rajagopal V, Mighty J; Regeneron Genetics Center; DiscovEHR; Jones M, Mitnaul L, Stahl E, Coppola G, Jorgenson E, Habegger L, Salerno WJ, Shuldiner AR, Lotta LA, Overton JD, Cantor MN, Reid JG, Yancopoulos G, Kang HM, Marchini J, Baras A, Abecasis GR, Ferreira MAR. Backman JD, et al. Nature. 2021 Nov;599(7886):628-634. doi: 10.1038/s41586-021-04103-z. Epub 2021 Oct 18. Nature. 2021. PMID: 34662886 Free PMC article.
-
Analysis of 72,469 UK Biobank exomes links rare variants to male-pattern hair loss.
Henne SK, Aldisi R, Sivalingam S, Hochfeld LM, Borisov O, Krawitz PM, Maj C, Nöthen MM, Heilmann-Heimbach S. Henne SK, et al. Nat Commun. 2023 Sep 22;14(1):5492. doi: 10.1038/s41467-023-41186-w. Nat Commun. 2023. PMID: 37737258 Free PMC article.
-
UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test.
Zhao Z, Bi W, Zhou W, VandeHaar P, Fritsche LG, Lee S. Zhao Z, et al. Am J Hum Genet. 2020 Jan 2;106(1):3-12. doi: 10.1016/j.ajhg.2019.11.012. Epub 2019 Dec 19. Am J Hum Genet. 2020. PMID: 31866045 Free PMC article.
-
Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank.
Szustakowski JD, Balasubramanian S, Kvikstad E, Khalid S, Bronson PG, Sasson A, Wong E, Liu D, Wade Davis J, Haefliger C, Katrina Loomis A, Mikkilineni R, Noh HJ, Wadhawan S, Bai X, Hawes A, Krasheninina O, Ulloa R, Lopez AE, Smith EN, Waring JF, Whelan CD, Tsai EA, Overton JD, Salerno WJ, Jacob H, Szalma S, Runz H, Hinkle G, Nioi P, Petrovski S, Miller MR, Baras A, Mitnaul LJ, Reid JG; UKB-ESC Research Team. Szustakowski JD, et al. Nat Genet. 2021 Jul;53(7):942-948. doi: 10.1038/s41588-021-00885-0. Epub 2021 Jun 28. Nat Genet. 2021. PMID: 34183854 Review.
-
Tan VY, Timpson NJ. Tan VY, et al. Annu Rev Genomics Hum Genet. 2022 Aug 31;23:569-589. doi: 10.1146/annurev-genom-121321-093606. Epub 2022 May 4. Annu Rev Genomics Hum Genet. 2022. PMID: 35508184 Review.
Cited by
-
Genome-wide association testing beyond SNPs.
Harris L, McDonagh EM, Zhang X, Fawcett K, Foreman A, Daneck P, Sergouniotis PI, Parkinson H, Mazzarotto F, Inouye M, Hollox EJ, Birney E, Fitzgerald T. Harris L, et al. Nat Rev Genet. 2025 Mar;26(3):156-170. doi: 10.1038/s41576-024-00778-y. Epub 2024 Oct 7. Nat Rev Genet. 2025. PMID: 39375560 Review.
-
Hindy G, Tyrrell DJ, Vasbinder A, Wei C, Presswalla F, Wang H, Blakely P, Ozel AB, Graham S, Holton GH, Dowsett J, Fahed AC, Amadi KM, Erne GK, Tekmulla A, Ismail A, Launius C, Sotoodehnia N, Pankow JS, Thørner LW, Erikstrup C, Pedersen OB, Banasik K, Brunak S, Ullum H, Eugen-Olsen J, Ostrowski SR; DBDS Consortium; Haas ME, Nielsen JB, Lotta LA; Regeneron Genetics Center; Engström G, Melander O, Orho-Melander M, Zhao L, Murthy VL, Pinsky DJ, Willer CJ, Heckbert SR, Reiser J, Goldstein DR, Desch KC, Hayek SS. Hindy G, et al. J Clin Invest. 2022 Dec 15;132(24):e158788. doi: 10.1172/JCI158788. J Clin Invest. 2022. PMID: 36194491 Free PMC article.
-
Hypothesis-free phenotype prediction within a genetics-first framework.
Lu C, Zaucha J, Gam R, Fang H, Ben Smithers, Oates ME, Bernabe-Rubio M, Williams J, Zelenka N, Pandurangan AP, Tandon H, Shihab H, Kalaivani R, Sung M, Sardar AJ, Tzovoras BG, Danovi D, Gough J. Lu C, et al. Nat Commun. 2023 Feb 17;14(1):919. doi: 10.1038/s41467-023-36634-6. Nat Commun. 2023. PMID: 36808136 Free PMC article.
-
Nag A, Dhindsa RS, Middleton L, Jiang X, Vitsios D, Wigmore E, Allman EL, Reznichenko A, Carss K, Smith KR, Wang Q, Challis B, Paul DS, Harper AR, Petrovski S. Nag A, et al. Am J Hum Genet. 2023 Mar 2;110(3):487-498. doi: 10.1016/j.ajhg.2023.02.002. Epub 2023 Feb 20. Am J Hum Genet. 2023. PMID: 36809768 Free PMC article.
-
Rare and Common Variants in KIF15 Contribute to Genetic Risk of Idiopathic Pulmonary Fibrosis.
Zhang D, Povysil G, Kobeissy PH, Li Q, Wang B, Amelotte M, Jaouadi H, Newton CA, Maher TM, Molyneaux PL, Noth I, Martinez FJ, Raghu G, Todd JL, Palmer SM, Haefliger C, Platt A, Petrovski S, Garcia JA, Goldstein DB, Garcia CK. Zhang D, et al. Am J Respir Crit Care Med. 2022 Jul 1;206(1):56-69. doi: 10.1164/rccm.202110-2439OC. Am J Respir Crit Care Med. 2022. PMID: 35417304 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases