How do SNP ascertainment schemes and population demographics affect inferences about population history? - PubMed
- ️Thu Jan 01 2015
How do SNP ascertainment schemes and population demographics affect inferences about population history?
Emily Jane McTavish et al. BMC Genomics. 2015.
Abstract
Background: The selection of variable sites for inclusion in genomic analyses can influence results, especially when exemplar populations are used to determine polymorphic sites. We tested the impact of ascertainment bias on the inference of population genetic parameters using empirical and simulated data representing the three major continental groups of cattle: European, African, and Indian. We simulated data under three demographic models. Each simulated data set was subjected to three ascertainment schemes: (I) random selection; (II) geographically biased selection; and (III) selection biased toward loci polymorphic in multiple groups. Empirical data comprised samples of 25 individuals representing each continental group. These cattle were genotyped for 47,506 loci from the bovine 50 K SNP panel. We compared the inference of population histories for the empirical and simulated data sets across different ascertainment conditions using F ST and principal components analysis (PCA).
Results: Bias toward shared polymorphism across continental groups is apparent in the empirical SNP data. Bias toward uneven levels of within-group polymorphism decreases estimates of F ST between groups. Subpopulation-biased selection of SNPs changes the weighting of principal component axes and can affect inferences about proportions of admixture and population histories using PCA. PCA-based inferences of population relationships are largely congruent across types of ascertainment bias, even when ascertainment bias is strong.
Conclusions: Analyses of ascertainment bias in genomic data have largely been conducted on human data. As genomic analyses are being applied to non-model organisms, and across taxa with deeper divergences, care must be taken to consider the potential for bias in ascertainment of variation to affect inferences. Estimates of F ST , time of separation, and population divergence as estimated by principal components analysis can be misleading if this bias is not taken into account.
Figures

Demographic model used for simulations. Parameter values are described in Table 1. Arrows represent migration between populations. Arrow width is representative of relative values of these migration parameters under demographic scenario c. Figure created using MatPlotLib [76] in IPython [77].

Gene trees generated according to the demographic models under each of three migration scenarios. Gene trees are plotted atop one another so that patterns of variation among loci are visible. ( a ) No migration; ( b ) low taurine–indicine gene flow; and ( c ) low taurine–indicine gene flow, plus higher recent indicine to Africa gene flow. Figure created using the Densitree function [78] in the phangorn package [79] of R [57].

Venn diagrams illustrate the counts of polymorphisms segregating within each continental group for one example replicate. Sizes of circles and areas of overlap are approximately proportional to number of sites in those categories. Fixed differences between populations are not shown here. (A) Full data sets for the empirical data and the three simulated data sets. (B) 1,000-marker subsets of the empirical data set and the simulated data sets. Three demographic conditions were analyzed: (a) No migration; (b) low taurine–indicine gene flow; and (c) low taurine–indicine gene flow, plus higher recent indicine to Africa gene flow. In addition, three types of ascertainment sampling scheme were applied: (I) SNPs were based on random samples of loci (no bias); (II) sampled loci were selected from those that were polymorphic within Europe; and (III) sampled loci were selected from loci that were polymorphic in two or more subpopulations. Figure made using EulerAPE [80]. Counts of polymorphisms in all groups are shown in Additional file 1: Table S2.

Principal components analysis performed on 1,000-marker subsets of simulated data under three simulated migration schemes and three simulated ascertainment-bias conditions, as compared to the empirical data. ( a ) No migration; ( b ) low taurine–indicine gene flow; and ( c ) low taurine–indicine gene flow, plus higher recent indicine to Africa gene flow. Ascertainment schemes: (I) SNPs were based on random samples of loci (no bias); (II) sampled loci were selected from those that were polymorphic within Europe; and (III) sampled loci were selected from loci that were polymorphic in two or more subpopulations. Proportions of variation accounted for by the first two PC axes are labeled on the figure.
Similar articles
-
Flegontov P, Işıldak U, Maier R, Yüncü E, Changmai P, Reich D. Flegontov P, et al. PLoS Genet. 2023 Sep 7;19(9):e1010931. doi: 10.1371/journal.pgen.1010931. eCollection 2023 Sep. PLoS Genet. 2023. PMID: 37676865 Free PMC article.
-
Effects of single nucleotide polymorphism ascertainment on population structure inferences.
Dokan K, Kawamura S, Teshima KM. Dokan K, et al. G3 (Bethesda). 2021 Sep 6;11(9):jkab128. doi: 10.1093/g3journal/jkab128. G3 (Bethesda). 2021. PMID: 33871576 Free PMC article.
-
Malomane DK, Reimer C, Weigend S, Weigend A, Sharifi AR, Simianer H. Malomane DK, et al. BMC Genomics. 2018 Jan 5;19(1):22. doi: 10.1186/s12864-017-4416-9. BMC Genomics. 2018. PMID: 29304727 Free PMC article.
-
SNP ascertainment bias in population genetic analyses: why it is important, and how to correct it.
Lachance J, Tishkoff SA. Lachance J, et al. Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9. Bioessays. 2013. PMID: 23836388 Free PMC article. Review.
-
Population genetic analysis of ascertained SNP data.
Nielsen R. Nielsen R. Hum Genomics. 2004 Mar;1(3):218-24. doi: 10.1186/1479-7364-1-3-218. Hum Genomics. 2004. PMID: 15588481 Free PMC article. Review.
Cited by
-
Rezende VB, Congrains C, Lima AL, Campanini EB, Nakamura AM, Oliveira JL, Chahad-Ehlers S, Junior IS, Alves de Brito R. Rezende VB, et al. G3 (Bethesda). 2016 Oct 13;6(10):3283-3295. doi: 10.1534/g3.116.030486. G3 (Bethesda). 2016. PMID: 27558666 Free PMC article.
-
Edea Z, Dessie T, Dadi H, Do KT, Kim KS. Edea Z, et al. Front Genet. 2017 Dec 22;8:218. doi: 10.3389/fgene.2017.00218. eCollection 2017. Front Genet. 2017. PMID: 29312441 Free PMC article.
-
Targeted capture in evolutionary and ecological genomics.
Jones MR, Good JM. Jones MR, et al. Mol Ecol. 2016 Jan;25(1):185-202. doi: 10.1111/mec.13304. Epub 2015 Jul 30. Mol Ecol. 2016. PMID: 26137993 Free PMC article. Review.
-
Fitak RR, Rinkevich SE, Culver M. Fitak RR, et al. J Hered. 2018 May 11;109(4):372-383. doi: 10.1093/jhered/esy009. J Hered. 2018. PMID: 29757430 Free PMC article.
-
Wang Y, Segelke D, Emmerling R, Bennewitz J, Wellmann R. Wang Y, et al. G3 (Bethesda). 2017 Dec 4;7(12):4009-4018. doi: 10.1534/g3.117.300272. G3 (Bethesda). 2017. PMID: 29089375 Free PMC article.
References
-
- Brumfield RT, Beerli P, Nickerson DA, Edwards SV. The utility of single nucleotide polymorphisms in inferences of population history. Trends Ecol Evol. 2003;18:249–56. doi: 10.1016/S0169-5347(03)00018-1. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials