pubmed.ncbi.nlm.nih.gov

Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes - PubMed

  • ️Sat Jan 01 2022

Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes

Remo Monti et al. Nat Commun. 2022.

Abstract

Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene-based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for missense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood-ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.

© 2022. The Author(s).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Rare-variant association testing pipeline.

Exome sequencing measures exon-proximal genetic variants. All variants are subjected to functional variant effect prediction (VEP). Qualifying variants are determined based on the variant effect predictions and minor allele frequencies (MAF < 0.1%) and categorized based on their predicted functional impacts (protein loss of function, missense, splicing, RBP-binding). Finally, we test the different categories of qualifying variants in gene-based association tests against 30 biomarkers using gene-based variant collapsing and kernel-based tests.

Fig. 2
Fig. 2. Association tests overview.

a Histograms of the number of qualifying variants per tested gene for the different variant categories. Ranges are truncated at 300 variants, which affected 730 genes for missene, 84 for splice, 7 for pLOF, and 15 for rbp. b Bar plot of the number of significant genes found by testing qualifying variants in the different categories. Tests for splice and missense variants (left) were dynamically combined with pLOF variants, i.e., two p values, one arising from a test including only missense/splice variants and one combining those variants with pLOF variants were combined using the Cauchy combination test. c Bar plot showing 193 significant gene-biomarker associations for 28 biomarkers (x-axis). Bars in panels (a) and (b) are colored by the variant-type (or a combination thereof) which gave the lowest p value (lead annotation).

Fig. 3
Fig. 3. Comparison of EUR vs. all-ancestry (AA) analyses.

a Venn diagram of the number of significant gene-biomarker associations (p < 1.61 × 10−8, FWER ≤  0.05) identified in either analysis (Supplementary Data 1 and 3). b Scatter plot showing the smallest p value across variant-effect- and test-types for each gene-biomarker association in the EUR-analysis (x-axis) vs. in the analysis with all ancestries (y-axis) on the negative log10-scale. The thick black box denotes the area highlighted in panel (c). Associations significant in either analysis are color-coded according to (a) and drawn thicker. The significance threshold is drawn as a light gray dashed horizontal/vertical line. Non-significant associations are drawn in gray. Associations with p > 0.1 in both analyses are not shown. d For significant gene-biomarker associations in either analysis, we show if the variant effect types and the test-types (gbvc or kernel-based) which gave the smallest p values were consistent between the two analyses.

Fig. 4
Fig. 4. Local collapsing of missense variants.

Dosage box plots showing the alternative amino acid counts (x-axis) against the covariate-adjusted quantile-transformed phenotypes (y-axis). Collapsing variants by amino acid position within significant genes (FWER ≤ 0.05) identified negative associations of PIEZO1 R1772-variants with HbA1c (p = 1.61 × 10−8), ABCA1 W590-variants with C-reactive protein (p = 3.43 × 10−38), and HNF4A R136-variants with Apolipoprotein A (p = 1.2 × 10−10). P values were derived using the score test after weighting and collapsing variants (Methods). Collapsing together with ClinVar variants with reported conditions (marked with "*") helps place novel variants into disease context. For all three associations, collapsed p values were lower than those of the single variants. Center lines denote the medians. The lower and upper hinges indicate 25th and 75th percentiles, whiskers extend to the largest/lowest values no further than 1.5 × IQR away from the upper/lower hinges and black points denote outliers. Maxima and minima, from left to right: −5.33 to 4.95, −4.05 to 5.69, and −4.7 to 4.5.

Fig. 5
Fig. 5. Variants prioritized by deep learning models.

a Dosage box plots showing covariate-adjusted quantile-transformed phenotypes against minor allele counts for variants in SLC9A5 and ANGPTL3/DOCK7. A predicted splice variant 16:67270978:G:A is negatively associated with HDL cholesterol (p = 7.83 × 10−12, score test), whereas intronic 1:62598067:T:C is negatively associated with Triglycerides (p = 1.37 × 10−25, score test). The numbers in brackets denote the number of carriers of at least one alternative allele. Center lines denote the medians. The lower and upper hinges indicate 25th and 75th percentiles. Whiskers extend to the largest/lowest values no further than 1.5 × IQR away from the upper/lower hinges and black points denote outliers. Maxima and minima, from left to right: −5.24 to 5.18, −4.62 to 4.5. b DeepRiPe binding probabilities for 1:62598067:T:C for three RBPs in HepG2 cells. While predicted probabilities for the reference sequence are ambiguous, the alternative allele shifts binding probabilities in favor of QKI and HNRNPL. All RBPs with absolute predicted variant effects above 0.2 and binding probabilities greater than 0.5 for either reference or alternative alleles are shown.

Similar articles

Cited by

References

    1. Sudlow C, et al. Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLos Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. - DOI - PMC - PubMed
    1. Buniello A, et al. The nhgri-ebi gwas catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120. - DOI - PMC - PubMed
    1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. - DOI - PMC - PubMed
    1. Hernandez RD, et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 2019;51:1349–1355. doi: 10.1038/s41588-019-0487-7. - DOI - PMC - PubMed
    1. Zhu Q, et al. A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am. J. Hum. Genet. 2011;88:458–468. doi: 10.1016/j.ajhg.2011.03.008. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances