Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure - PubMed
Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure
Jong Wha J Joo et al. Genetics. 2016 Dec.
Abstract
A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.
Keywords: mixed models; multivariate analysis; population structure.
Copyright © 2016 by the Genetics Society of America.
Figures

The results of different methods applied to a simulated data set. The x-axis shows SNP locations and the y-axis shows log10P-values of associations between each SNP and all the genes. Blue ↓ shows the location of the true trans-regulatory hotspots. (A) The result of the standard t-test. (B) The result of EMMA. For (A) and (B), we averaged the log10P-values over all of the genes for each SNP. (C) The result of MDMR. (D) The result of GAMMA.

An eQTL map of a real yeast data set. P-values are estimated from NICE (Joo et al. 2014). The x-axis corresponds to SNP locations and the y-axis corresponds to the gene locations. The intensity of each point on the map represents the significance of the association. The diagonal band represents the cis effects and the vertical bands represent trans-regulatory hotspots.

The results of MDMR and GAMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-values. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo et al. 2014) for the yeast data. (A) The result of MDMR. (B) The result of GAMMA.

The results of the standard t-test and EMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log10 of P-value over the genes. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo et al. 2014) in the yeast data. (A) The result of the standard t-test. (B) The result of EMMA.

The result of GAMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-value.

The result of MDMR applied to chromosome 19 of a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-value.

The results of the standard t-test and EMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log10 of P-value over the genus. (A) The result of the standard t-test. (B) The result of EMMA.
Similar articles
-
A Lasso multi-marker mixed model for association mapping with population structure correction.
Rakitsch B, Lippert C, Stegle O, Borgwardt K. Rakitsch B, et al. Bioinformatics. 2013 Jan 15;29(2):206-14. doi: 10.1093/bioinformatics/bts669. Epub 2012 Nov 22. Bioinformatics. 2013. PMID: 23175758
-
A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data.
Wang P, Rahman M, Jin L, Xiong M. Wang P, et al. BMC Genomics. 2016 Nov 7;17(1):881. doi: 10.1186/s12864-016-3169-1. BMC Genomics. 2016. PMID: 27821073 Free PMC article.
-
Genome-wide association studies with high-dimensional phenotypes.
Marttinen P, Gillberg J, Havulinna A, Corander J, Kaski S. Marttinen P, et al. Stat Appl Genet Mol Biol. 2013 Aug;12(4):413-31. doi: 10.1515/sagmb-2012-0032. Stat Appl Genet Mol Biol. 2013. PMID: 23759510
-
Population structure in genetic studies: Confounding factors and mixed models.
Sul JH, Martin LS, Eskin E. Sul JH, et al. PLoS Genet. 2018 Dec 27;14(12):e1007309. doi: 10.1371/journal.pgen.1007309. eCollection 2018 Dec. PLoS Genet. 2018. PMID: 30589851 Free PMC article. Review.
-
Software engineering the mixed model for genome-wide association studies on large samples.
Zhang Z, Buckler ES, Casstevens TM, Bradbury PJ. Zhang Z, et al. Brief Bioinform. 2009 Nov;10(6):664-75. doi: 10.1093/bib/bbp050. Brief Bioinform. 2009. PMID: 19933212 Review.
Cited by
-
Finding associated variants in genome-wide association studies on multiple traits.
Gai L, Eskin E. Gai L, et al. Bioinformatics. 2018 Jul 1;34(13):i467-i474. doi: 10.1093/bioinformatics/bty249. Bioinformatics. 2018. PMID: 29949991 Free PMC article.
-
Batista S, Madar VS, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Chitre AS, Palmer AA, Moore JH. Batista S, et al. BioData Min. 2024 Feb 28;17(1):7. doi: 10.1186/s13040-024-00358-0. BioData Min. 2024. PMID: 38419006 Free PMC article.
-
Statistical methods to detect pleiotropy in human complex traits.
Hackinger S, Zeggini E. Hackinger S, et al. Open Biol. 2017 Nov;7(11):170125. doi: 10.1098/rsob.170125. Open Biol. 2017. PMID: 29093210 Free PMC article. Review.
-
Mbatchou J, McPeek MS. Mbatchou J, et al. Am J Hum Genet. 2024 Aug 8;111(8):1750-1769. doi: 10.1016/j.ajhg.2024.06.010. Epub 2024 Jul 17. Am J Hum Genet. 2024. PMID: 39025064 Free PMC article.
-
Multiple testing correction in linear mixed models.
Joo JW, Hormozdiari F, Han B, Eskin E. Joo JW, et al. Genome Biol. 2016 Apr 1;17:62. doi: 10.1186/s13059-016-0903-6. Genome Biol. 2016. PMID: 27039378 Free PMC article.
References
-
- Berger M., Stassen H. H., Köhler K., Krane V., Mönks D., et al. , 2006. Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14: 236–244. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials