pubmed.ncbi.nlm.nih.gov

Genome scans for selection and introgression based on k-nearest neighbour techniques - PubMed

Genome scans for selection and introgression based on k-nearest neighbour techniques

Bastian Pfeifer et al. Mol Ecol Resour. 2020 Nov.

Abstract

In recent years, genome-scan methods have been extensively used to detect local signatures of selection and introgression. Most of these methods are either designed for one or the other case, which may impair the study of combined cases. Here, we introduce a series of versatile genome-scan methods applicable for both cases, the detection of selection and introgression. The proposed approaches are based on nonparametric k-nearest neighbour (kNN) techniques, while incorporating pairwise Fixation Index (FST ) and pairwise nucleotide differences (dxy ) as features. We benchmark our methods using a wide range of simulation scenarios, with varying parameters, such as recombination rates, population background histories, selection strengths, the proportion of introgression and the time of gene flow. We find that kNN-based methods perform remarkably well compared with the state-of-the-art. Finally, we demonstrate how to perform kNN-based genome scans on real-world genomic data using the population genomics R-package popgenome.

Keywords: adaptation; genome scans; introgression; k-nearest neighbours.

© 2020 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.

PubMed Disclaimer

Figures

Figure 1
Figure 1

A graphical illustration of local adaptation. A three population genealogy with selection introduced at ts × 4Ne generations ago in population P 1 [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 2
Figure 2

A graphical illustration of introgression. A three population species tree with an unidirectional introgression event from the ancestral population P 3 to population P 2 introduced tGF×4Ne generations ago. The proportion of introgression is indicated by f

Figure 3
Figure 3

Local adaptation: varying the coalescent time to population P 3 (t 123). The result for the kNN‐based methods using pairwise F ST as features, for 100 sequentially sampled k values and in comparison with the accuracy of F ST,

pcadapt

, F LK and

blockfest

. The recombination rate is r = 0.001, and the number of SNPs per region is 50. (a) The simulations are based on a star formed genealogie (t12=0.1×4Ne=t123). The coalescent time to population P 3 is (b) t123=0.5×4Ne, (c) t123=0.7×4Ne and (d) t123=0.9×4Ne generations ago. The expected AUC value of a random classifier is AUC = 0.5

Figure 4
Figure 4

Detecting selection with a computed k. The kNN methods with pairwise F ST as features, compared with F ST,

pcadapt

, F LK and

blockfest

. The recombination rate is r = 0.001, and the number of SNPs per region is 50. (a and b). Varying the coalescent time with population P 3 (t123=[0.1,0.3,0.5,0.7,0.9]×4Ne generations ago). The realized mean F ST over all regions is FST=[0.17,0.31,0.42,0.50,0.55]. (c and d). Varying the recombination rate (r=[0.001,0.005,0.01,0.05]). The coalescent time with population P 3 is t123=0.7×4Ne generations ago. The realized mean F ST over all regions is FST=[0.31,0.32,0.32,0.31]. The expected value of a random classifier is AUC = 0.5 and PR‐AUC = 50/1,000=0.05

Figure 5
Figure 5

Varying the fraction of introgression (f). The result for the kNN‐based methods using pairwise F ST as features, for 100 sequentially sampled k values. Coalescent times are t12=0.1×4Ne and t123=0.9×4Ne generations ago. Recent introgression is introduced tGF=0.01×4Ne generations ago, and the recombination rate is set to r=0.01 in all simulations. The outcome of the kNN‐based methods is compared to F ST, D 3 and |D 3|. The fraction of introgression is (a) f = 0.9, (b) f = 0.7, (c) f = 0.5 and (d) f = 0.3. The expected AUC value of a random classifier is AUC = 0.5

Figure 6
Figure 6

Detecting introgression with a computed k. The accuracy of the kNN methods with pairwise F ST as features, compared with F ST, D 3 and |D 3|. The recombination rate is r = 0.01 in all simulations. (a, b). Varying the fraction of introgression (f=[0.9,0.7,0.5,0.3]). Coalescent times are t12=0.1×4Ne and t123=0.9×4Ne generations ago, and recent introgression is introduced tGF=0.01×4Ne generations ago. The realized mean F ST over all regions is FST=[0.50,0.50,0.50,0.50]. (c, d). Varying the time of gene flow (tGF=[0.1,0.3,0.5,0.8]×4Ne) with an fixed fraction of introgression of f = 0.7. Coalescent times are t12=1×4Ne and t123=2×4Ne generations ago. The realized mean F ST over all regions is FST=[0.75,0.75,0.75,0.75]. The expected value of a random classifier is AUC = 0.5 and PR‐AUC = 50/1,000 = 0.05

Figure 7
Figure 7

Genome‐scan plots of human chromosome 2. (a–h). The kNN scores are shown along human chromosome 2 based on 100‐kb consecutive sliding windows. Red and orange dots are outliers identified by the kNN methods (0.005‐quantile of the scores). Red dots indicate that all methods agree on these outliers and orange dots otherwise. (i). A diagnostic plot is shown with the pairwise rank correlations of the kNN scores while varying the parameter k

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium and others (2015). A global reference for human genetic variation. Nature, 526, 68. - PMC - PubMed
    1. Angiulli, F. , & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces In European Conference on Principles of Data Mining and Knowledge Discovery. Proceedings, 15–27 Springer.
    1. Angiulli, F. , & Pizzuti, C. (2005). Outlier mining in large high‐dimensional data sets. IEEE Transactions on Knowledge and Data Engineering, 17, 203–215. 10.1109/TKDE.2005.31 - DOI
    1. Bache, K. , & Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California; School of information and Computer Science, 28. Retrieved from http://archive.ics.uci.edu/ml
    1. Baye, T. M. , Wilke, R. A. , & Olivier, M. (2009). Genomic and geographic distribution of private SNPs and pathways in human populations. Personalized Medicine, 6, 623–641. 10.2217/pme.09.54 - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources