Ascertainment biases in SNP chips affect measures of population divergence - PubMed
- ️Invalid Date
Ascertainment biases in SNP chips affect measures of population divergence
Anders Albrechtsen et al. Mol Biol Evol. 2010 Nov.
Abstract
Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of human populations using measures of genetic differentiation such as F(ST) or principal component analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias on inferences regarding genetic differentiation among populations in one of the common genome-wide genotyping platforms. We generate SNP genotyping data for individuals that previously have been subject to partial genome-wide Sanger sequencing and compare inferences based on genotyping data to inferences based on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping calling algorithms can have a surprisingly large effect on population genetic inferences. We not only present a correction of the spectrum for the widely used Affymetrix SNP chips but also show that such corrections are difficult to generalize among studies.
Figures

Folded frequency spectrum for 19 African Americans for synonymous sequencing data and for SNP chip data. Only SNPs without missing data are included.

Folded frequency spectrum for 20 European Americans for synonymous sequencing data and for SNP chip data. Only SNPs without missing data are included.

The 2D Celera frequency spectra (density) for 20 European Americans and 19 African Americans for the SNP chip data (left) and the synonymous SNPs from the resequencing data (right).

The effect of ascertainment bias on the frequency spectrum. The frequency spectrum is projected down to ten individuals for all populations for the NIEHS data. We simulated ascertainment in each of the five populations with an ascertainment sample of ten individuals.

Folded frequency spectrum for 45 unrelated Japanese HapMap individuals. The individuals were genotyped using approximately 500,000 SNPs on the Affymetrix SNP 500k chip set. The frequency spectrum is shown for SNPs binned based on their maximum BRLMM confidence score.

Pairwise FST between European Americans and Africans for different ascertainment schemes. Standard error bars were estimated using 1,000 bootstrap samples. For the black bars, the SNPs are sampled independently and for the gray bars, the SNPs are sampled genewise from the 251 genes. The ascertainment was performed in African Americans and a varying number of individuals were used for the ascertainment. NAsc is the number of ascertained SNPs. The FST estimate using all SNPs (no ascertainment bias) is labeled “None.”

Mean pairwise FST components between pairs of populations using the NIEHS data. The ascertainment was performed in African Americans and a different number of individuals were used for the ascertainment. Each SNP gives an estimate for the total variance and the between population variance. The plot shows the normalized mean variances. We normalized by dividing by the mean total variance and mean between population variance, respectively, for all the SNPs in the sample (regardless of ascertainment).

Mean pairwise distances within and between Africans and European Americans based on the two first principal components. Standard errors are estimated using bootstrap. The light colors are standard errors based on a gene-wise bootstrap and the dark colors are the standard bootstrap where the SNPs are sampled independently. Ascertainment, using an increasing number of individuals, was simulated using the Asian, the Hispanic, and the African-American population, respectively, from the NIEHS resequencing data. There are 5,948 SNPs without missing data in any population.

PCA plot for the first two principal components for four populations from the NIEHS data. The left plot has no ascertainment bias. In the right plot, ascertainment was performed using all the 24 individuals from the Asian populations. All SNPs that were polymorphic in the four populations were used. Out of the 8,231 SNPs, 2,751 were ascertained in the left plot. We resampled 2,751 random SNPs without replacement from the 8,231 SNPs and performed PCA on them. None of the 1,000 resamples gave a higher distance between the European Americans than the distance between the African Americans and the Africans.

We fitted the frequency spectrum of the Celera European Americans for the synonymous SNPs using a mixture of exponentials. The spectra for the observed and the fitted data are folded.

The Celera European Americans synonymous frequency spectra for the resequencing data, the SNP chip data, and the corrected frequency spectrum. The correction assumed that two ascertainment panels were used and the SNPs were ascertained if both alleles were observed at least twice.
Similar articles
-
McTavish EJ, Hillis DM. McTavish EJ, et al. BMC Genomics. 2015 Apr 3;16(1):266. doi: 10.1186/s12864-015-1469-5. BMC Genomics. 2015. PMID: 25887858 Free PMC article.
-
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Flegontov P, et al. PLoS Genet. 2023 Sep 7;19(9):e1010931. doi: 10.1371/journal.pgen.1010931. eCollection 2023 Sep. PLoS Genet. 2023. PMID: 37676865 Free PMC article.
-
Effects of single nucleotide polymorphism ascertainment on population structure inferences.
Dokan K, Kawamura S, Teshima KM. Dokan K, et al. G3 (Bethesda). 2021 Sep 6;11(9):jkab128. doi: 10.1093/g3journal/jkab128. G3 (Bethesda). 2021. PMID: 33871576 Free PMC article.
-
SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it.
Lachance J, Tishkoff SA. Lachance J, et al. Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9. Bioessays. 2013. PMID: 23836388 Free PMC article. Review.
-
CNV discovery using SNP genotyping arrays.
Yau C, Holmes CC. Yau C, et al. Cytogenet Genome Res. 2008;123(1-4):307-12. doi: 10.1159/000184722. Epub 2009 Mar 11. Cytogenet Genome Res. 2008. PMID: 19287169 Review.
Cited by
-
Wehrenberg G, Tokarska M, Cocchiararo B, Nowak C. Wehrenberg G, et al. Sci Rep. 2024 Jan 22;14(1):1875. doi: 10.1038/s41598-024-51495-9. Sci Rep. 2024. PMID: 38253649 Free PMC article.
-
Arca M, Gouesnard B, Mary-Huard T, Le Paslier MC, Bauland C, Combes V, Madur D, Charcosset A, Nicolas SD. Arca M, et al. Plant Biotechnol J. 2023 Jun;21(6):1123-1139. doi: 10.1111/pbi.14022. Epub 2023 Apr 13. Plant Biotechnol J. 2023. PMID: 36740649 Free PMC article.
-
A general linear model-based approach for inferring selection to climate.
Raj SM, Pagani L, Gallego Romero I, Kivisild T, Amos W. Raj SM, et al. BMC Genet. 2013 Sep 22;14:87. doi: 10.1186/1471-2156-14-87. BMC Genet. 2013. PMID: 24053227 Free PMC article.
-
Fumagalli M, Sironi M, Pozzoli U, Ferrer-Admetlla A, Pattini L, Nielsen R. Fumagalli M, et al. PLoS Genet. 2011 Nov;7(11):e1002355. doi: 10.1371/journal.pgen.1002355. Epub 2011 Nov 3. PLoS Genet. 2011. PMID: 22072984 Free PMC article.
-
Elhaik E, Tatarinova T, Chebotarev D, Piras IS, Maria Calò C, De Montis A, Atzori M, Marini M, Tofanelli S, Francalacci P, Pagani L, Tyler-Smith C, Xue Y, Cucca F, Schurr TG, Gaieski JB, Melendez C, Vilar MG, Owings AC, Gómez R, Fujita R, Santos FR, Comas D, Balanovsky O, Balanovska E, Zalloua P, Soodyall H, Pitchappan R, Ganeshprasad A, Hammer M, Matisoo-Smith L, Wells RS; Genographic Consortium. Elhaik E, et al. Nat Commun. 2014 Apr 29;5:3513. doi: 10.1038/ncomms4513. Nat Commun. 2014. PMID: 24781250 Free PMC article.
References
-
- Affymetrix. 2006. BRLMM: an improved genotype calling method for the GeneChip Human Mapping 500K Array Set. Technical Report. Santa Clara (CA): Affymetrix, Inc. Available from: http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper....
-
- Albrechtsen A, Sand Korneliussen T, Moltke I, van Overseem Hansen T, Nielsen FC, Nielsen R. Relatedness mapping and tracts of relatedness for genome-wide data in the presence of linkage disequilibrium. Genet Epidemiol. 2009;33:266–274. - PubMed
-
- Bustamante CD, Fledel-Alon A, Williamson S, et al. (14 co-authors) Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–1157. - PubMed
-
- Cann HM, de Toma C, Cazes L, et al. (41 co-authors) A human genome diversity cell line panel. Science. 2002;296:261–262. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous