Analysis of East Asia genetic substructure using genome-wide SNP arrays - PubMed
Analysis of East Asia genetic substructure using genome-wide SNP arrays
Chao Tian et al. PLoS One. 2008.
Abstract
Accounting for population genetic substructure is important in reducing type 1 errors in genetic studies of complex disease. As efforts to understand complex genetic disease are expanded to different continental populations the understanding of genetic substructure within these continents will be useful in design and execution of association tests. In this study, population differentiation (Fst) and Principal Components Analyses (PCA) are examined using >200 K genotypes from multiple populations of East Asian ancestry. The population groups included those from the Human Genome Diversity Panel [Cambodian, Yi, Daur, Mongolian, Lahu, Dai, Hezhen, Miaozu, Naxi, Oroqen, She, Tu, Tujia, Naxi, Xibo, and Yakut], HapMap [ Han Chinese (CHB) and Japanese (JPT)], and East Asian or East Asian American subjects of Vietnamese, Korean, Filipino and Chinese ancestry. Paired Fst (Wei and Cockerham) showed close relationships between CHB and several large East Asian population groups (CHB/Korean, 0.0019; CHB/JPT, 00651; CHB/Vietnamese, 0.0065) with larger separation with Filipino (CHB/Filipino, 0.014). Low levels of differentiation were also observed between Dai and Vietnamese (0.0045) and between Vietnamese and Cambodian (0.0062). Similarly, small Fst's were observed among different presumed Han Chinese populations originating in different regions of mainland of China and Taiwan (Fst's <0.0025 with CHB). For PCA, the first two PC's showed a pattern of relationships that closely followed the geographic distribution of the different East Asian populations. PCA showed substructure both between different East Asian groups and within the Han Chinese population. These studies have also identified a subset of East Asian substructure ancestry informative markers (EASTASAIMS) that may be useful for future complex genetic disease association studies in reducing type 1 errors and in identifying homogeneous groups that may increase the power of such studies.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures

Graphic representation of the first two PCs based on analysis with >200 K SNPs are shown. Color code shows subgroup of subjects for each population group. The subjects included Filipino (FIL), Vietnamese (VIET), Lahu, Dai, Cambodian (CAMB), Han Chinese (CHB), Mongola (MGL), Oroqen (ORQ), Daur, Korean (KOR), Chinese Americans from Taiwan (TWN),Yi, Hezhen (HEZ), Miaozu (MIAO), Naxi, She, Tu, Tujia (TUJ), Xibo, Chinese Americans (CHA), Japanese (JPT), and Yakut (YAK). A, Analyses including the Yakut population group. B, Analysis without Yakut is shown. C, Approximate geographic origin of population group is depicted on a map of East Asia (downloaded from University of Texas Library website). The positions of the HGDP population groups are based on the collection site information and the other population groups were placed based on self-identified country or region of origin. [Note: Yakut are not shown on the map since this population is from Siberia and is a considerable distance north of the depicted region.] D, Shows rotated results of PC1 and PC2 to assist illustration of geographic correspondence of ethnic group locations.

Color key shows groups as defined in Fig 1. A, PC3 and PC4. B, PC5 and PC6. C, PC7 and PC8.

A, The eigenvalues for each PC are shown for both the entire group of EAS (excluding Yakut), and for the five most populous ethnic groups (Chinese, Korean, Japanese, Filipino and Vietnamese). B, The proportion of the adjusted eigenvalue for each PC for the first 10 PCs is shown. For this measurement the PC10 eigenvalue for each group was used as the baseline. [Note: the eigenvalues plateau as shown in panel A and there is no discernable substructure beyond PC10 for these analyses (Table 2)]. For each PC, the PC10 eigen value was subtracted to determine an “adjusted” eigenvalue. The % substructure variation measurement was the proportion of each adjusted eigenvalue divided by the sum of the adjusted eigenvalues (PC1 through PC10).

A, Results from PCA performed together with EAS populations. B, PCA performed using only Chinese and Chinese American participants. The color coded population groups included the HapMap Han Chinese from Beijing (CHB), HGDP Han Chinese (HAN), HGDP North Han Chinese (HAN_N), Chinese American North (CHAN), Chinese American South (CHAS), Chinese American Central (CHAC), Taiwan Chinese American (TWN), Korean (KOR), and Hezhen (HEZ).

A, PCA analysis of tester population samples (see Table 3) using 200 K SNPs. B, PCA analysis of same tester population samples using 1500 EAS-AIMs.
Similar articles
-
Prefecture-level population structure of the Japanese based on SNP genotypes of 11,069 individuals.
Watanabe Y, Isshiki M, Ohashi J. Watanabe Y, et al. J Hum Genet. 2021 Apr;66(4):431-437. doi: 10.1038/s10038-020-00847-0. Epub 2020 Oct 14. J Hum Genet. 2021. PMID: 33051579
-
Shi CM, Liu Q, Zhao S, Chen H. Shi CM, et al. Ann Hum Genet. 2019 Sep;83(5):348-354. doi: 10.1111/ahg.12320. Epub 2019 Apr 26. Ann Hum Genet. 2019. PMID: 31025319
-
Jung JY, Kang PW, Kim E, Chacon D, Beck D, McNevin D. Jung JY, et al. Int J Legal Med. 2019 Nov;133(6):1711-1719. doi: 10.1007/s00414-019-02129-7. Epub 2019 Aug 7. Int J Legal Med. 2019. PMID: 31388795
-
Population genomics of East Asian ethnic groups.
Pan Z, Xu S. Pan Z, et al. Hereditas. 2020 Dec 8;157(1):49. doi: 10.1186/s41065-020-00162-w. Hereditas. 2020. PMID: 33292737 Free PMC article. Review.
-
The human genetic history of East Asia: weaving a complex tapestry.
Stoneking M, Delfin F. Stoneking M, et al. Curr Biol. 2010 Feb 23;20(4):R188-93. doi: 10.1016/j.cub.2009.11.052. Curr Biol. 2010. PMID: 20178766 Review.
Cited by
-
Forensic analysis and sequence variation of 133 STRs in the Hakka population.
Feng Y, Zhao Y, Lu X, Li H, Zhao K, Shi M, Wen S. Feng Y, et al. Front Genet. 2024 Jan 22;15:1347868. doi: 10.3389/fgene.2024.1347868. eCollection 2024. Front Genet. 2024. PMID: 38317659 Free PMC article.
-
Dong B, Li Q, Zhang T, Liang X, Jia M, Fu Y, Bai J, Fu S. Dong B, et al. Front Genet. 2021 Oct 11;12:756802. doi: 10.3389/fgene.2021.756802. eCollection 2021. Front Genet. 2021. PMID: 34745225 Free PMC article.
-
Yamada K, Iwayama Y, Toyota T, Ohnishi T, Ohba H, Maekawa M, Yoshikawa T. Yamada K, et al. Hum Genet. 2012 Mar;131(3):443-51. doi: 10.1007/s00439-011-1089-3. Epub 2011 Sep 17. Hum Genet. 2012. PMID: 21927946 Free PMC article.
-
Multiple genetic variants associated with primary biliary cirrhosis in a Han Chinese population.
Dong M, Li J, Tang R, Zhu P, Qiu F, Wang C, Qiu J, Wang L, Dai Y, Xu P, Gao Y, Han C, Wang Y, Wu J, Wu X, Zhang K, Dai N, Sun W, Zhou J, Hu Z, Liu L, Jiang Y, Nie J, Zhao Y, Gong Y, Tian Y, Ji H, Jiao Z, Jiang P, Shi X, Jawed R, Zhang Y, Huang Q, Li E, Wei Y, Xie W, Zhao W, Liu X, Zhu X, Qiu H, He G, Chen W, Seldin MF, Gershwin ME, Liu X, Ma X. Dong M, et al. Clin Rev Allergy Immunol. 2015 Jun;48(2-3):316-21. doi: 10.1007/s12016-015-8472-0. Clin Rev Allergy Immunol. 2015. PMID: 25690649 Free PMC article.
-
Vongpaisarnsin K, Listman JB, Malison RT, Gelernter J. Vongpaisarnsin K, et al. Leg Med (Tokyo). 2015 Jul;17(4):245-50. doi: 10.1016/j.legalmed.2015.02.004. Epub 2015 Feb 25. Leg Med (Tokyo). 2015. PMID: 25759192 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous