pubmed.ncbi.nlm.nih.gov

Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure - PubMed

Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

Jong Wha J Joo et al. Genetics. 2016 Dec.

Abstract

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

Keywords: mixed models; multivariate analysis; population structure.

PubMed Disclaimer

Figures

**Figure 1**
The results of different methods applied to a simulated data set. The x-axis shows SNP locations and the y-axis shows log10P-values of associations between each SNP and all the genes. Blue ↓ shows the location of the true *trans*-regulatory hotspots. (A) The result of the standard t-test. (B) The result of EMMA. For (A) and (B), we averaged the log10P-values over all of the genes for each SNP. (C) The result of MDMR. (D) The result of GAMMA.

**Figure 2**
An eQTL map of a real yeast data set. P-values are estimated from NICE (Joo *et al.* 2014). The x-axis corresponds to SNP locations and the y-axis corresponds to the gene locations. The intensity of each point on the map represents the significance of the association. The diagonal band represents the *cis* effects and the vertical bands represent *trans*-regulatory hotspots.

**Figure 3**
The results of MDMR and GAMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log₁₀ of P-values. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo *et al.* 2014) for the yeast data. (A) The result of MDMR. (B) The result of GAMMA.

**Figure 4**
The results of the standard t-test and EMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log₁₀ of P-value over the genes. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo *et al.* 2014) in the yeast data. (A) The result of the standard t-test. (B) The result of EMMA.

**Figure 5**
The result of GAMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log₁₀ of P-value.

**Figure 6**
The result of MDMR applied to chromosome 19 of a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log₁₀ of P-value.

**Figure 7**
The results of the standard t-test and EMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log₁₀ of P-value over the genus. (A) The result of the standard t-test. (B) The result of EMMA.

Cited by

Finding associated variants in genome-wide association studies on multiple traits.
Gai L, Eskin E. Gai L, et al. Bioinformatics. 2018 Jul 1;34(13):i467-i474. doi: 10.1093/bioinformatics/bty249. Bioinformatics. 2018. PMID: 29949991 Free PMC article.
Interaction models matter: an efficient, flexible computational framework for model-specific investigation of epistasis.
Batista S, Madar VS, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Chitre AS, Palmer AA, Moore JH. Batista S, et al. BioData Min. 2024 Feb 28;17(1):7. doi: 10.1186/s13040-024-00358-0. BioData Min. 2024. PMID: 38419006 Free PMC article.
Statistical methods to detect pleiotropy in human complex traits.
Hackinger S, Zeggini E. Hackinger S, et al. Open Biol. 2017 Nov;7(11):170125. doi: 10.1098/rsob.170125. Open Biol. 2017. PMID: 29093210 Free PMC article. Review.
JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression.
Mbatchou J, McPeek MS. Mbatchou J, et al. Am J Hum Genet. 2024 Aug 8;111(8):1750-1769. doi: 10.1016/j.ajhg.2024.06.010. Epub 2024 Jul 17. Am J Hum Genet. 2024. PMID: 39025064 Free PMC article.
Multiple testing correction in linear mixed models.
Joo JW, Hormozdiari F, Han B, Eskin E. Joo JW, et al. Genome Biol. 2016 Apr 1;17:62. doi: 10.1186/s13059-016-0903-6. Genome Biol. 2016. PMID: 27039378 Free PMC article.

References

1. Alter O., Brown P. O., Botstein D., 2000. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97: 10101–10106. - PMC - PubMed
1. Aschard H., Vilhjálmsson B. J., Greliche N., Morange P.-E. E., Trégouët D.-A. A., et al. , 2014. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am. J. Hum. Genet. 94: 662–676. - PMC - PubMed
1. Bennett B. J., Farber C. R., Orozco L., Kang H. M., Ghazalpour A., et al. , 2010. A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res. 20: 281–290. - PMC - PubMed
1. Berger M., Stassen H. H., Köhler K., Krane V., Mönks D., et al. , 2006. Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14: 236–244. - PubMed
1. Bokulich N. A., Subramanian S., Faith J. J., Gevers D., Gordon J. I., et al. , 2013. Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10: 57–59. - PMC - PubMed

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database
Research Materials
- NCI CPTC Antibody Characterization Program

Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure - PubMed