pubmed.ncbi.nlm.nih.gov

Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure - PubMed

Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

Jong Wha J Joo et al. Genetics. 2016 Dec.

Abstract

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

Keywords: mixed models; multivariate analysis; population structure.

Copyright © 2016 by the Genetics Society of America.

PubMed Disclaimer

Figures

Figure 1
Figure 1

The results of different methods applied to a simulated data set. The x-axis shows SNP locations and the y-axis shows log10P-values of associations between each SNP and all the genes. Blue ↓ shows the location of the true trans-regulatory hotspots. (A) The result of the standard t-test. (B) The result of EMMA. For (A) and (B), we averaged the log10P-values over all of the genes for each SNP. (C) The result of MDMR. (D) The result of GAMMA.

Figure 2
Figure 2

An eQTL map of a real yeast data set. P-values are estimated from NICE (Joo et al. 2014). The x-axis corresponds to SNP locations and the y-axis corresponds to the gene locations. The intensity of each point on the map represents the significance of the association. The diagonal band represents the cis effects and the vertical bands represent trans-regulatory hotspots.

Figure 3
Figure 3

The results of MDMR and GAMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-values. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo et al. 2014) for the yeast data. (A) The result of MDMR. (B) The result of GAMMA.

Figure 4
Figure 4

The results of the standard t-test and EMMA applied to a yeast data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log10 of P-value over the genes. Blue * above each plot shows putative hotspots that were reported in a previous study (Joo et al. 2014) in the yeast data. (A) The result of the standard t-test. (B) The result of EMMA.

Figure 5
Figure 5

The result of GAMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-value.

Figure 6
Figure 6

The result of MDMR applied to chromosome 19 of a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to −log10 of P-value.

Figure 7
Figure 7

The results of the standard t-test and EMMA applied to a gut microbiome data set. The x-axis corresponds to SNP locations and the y-axis corresponds to gene locations. The y-axis corresponds to sum of −log10 of P-value over the genus. (A) The result of the standard t-test. (B) The result of EMMA.

Similar articles

Cited by

References

    1. Alter O., Brown P. O., Botstein D., 2000. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97: 10101–10106. - PMC - PubMed
    1. Aschard H., Vilhjálmsson B. J., Greliche N., Morange P.-E. E., Trégouët D.-A. A., et al. , 2014. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am. J. Hum. Genet. 94: 662–676. - PMC - PubMed
    1. Bennett B. J., Farber C. R., Orozco L., Kang H. M., Ghazalpour A., et al. , 2010. A high-resolution association mapping panel for the dissection of complex traits in mice. Genome Res. 20: 281–290. - PMC - PubMed
    1. Berger M., Stassen H. H., Köhler K., Krane V., Mönks D., et al. , 2006. Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14: 236–244. - PubMed
    1. Bokulich N. A., Subramanian S., Faith J. J., Gevers D., Gordon J. I., et al. , 2013. Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10: 57–59. - PMC - PubMed

MeSH terms