Genome scans for selection and introgression based on k-nearest neighbour techniques - PubMed
Genome scans for selection and introgression based on k-nearest neighbour techniques
Bastian Pfeifer et al. Mol Ecol Resour. 2020 Nov.
Abstract
In recent years, genome-scan methods have been extensively used to detect local signatures of selection and introgression. Most of these methods are either designed for one or the other case, which may impair the study of combined cases. Here, we introduce a series of versatile genome-scan methods applicable for both cases, the detection of selection and introgression. The proposed approaches are based on nonparametric k-nearest neighbour (kNN) techniques, while incorporating pairwise Fixation Index (FST ) and pairwise nucleotide differences (dxy ) as features. We benchmark our methods using a wide range of simulation scenarios, with varying parameters, such as recombination rates, population background histories, selection strengths, the proportion of introgression and the time of gene flow. We find that kNN-based methods perform remarkably well compared with the state-of-the-art. Finally, we demonstrate how to perform kNN-based genome scans on real-world genomic data using the population genomics R-package popgenome.
Keywords: adaptation; genome scans; introgression; k-nearest neighbours.
© 2020 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
Figures

A graphical illustration of local adaptation. A three population genealogy with selection introduced at ts × 4Ne generations ago in population P 1 [Colour figure can be viewed at wileyonlinelibrary.com]

A graphical illustration of introgression. A three population species tree with an unidirectional introgression event from the ancestral population P 3 to population P 2 introduced tGF×4Ne generations ago. The proportion of introgression is indicated by f

Local adaptation: varying the coalescent time to population P 3 (t 123). The result for the kNN‐based methods using pairwise F ST as features, for 100 sequentially sampled k values and in comparison with the accuracy of F ST,
pcadapt, F LK and
blockfest. The recombination rate is r = 0.001, and the number of SNPs per region is 50. (a) The simulations are based on a star formed genealogie (t12=0.1×4Ne=t123). The coalescent time to population P 3 is (b) t123=0.5×4Ne, (c) t123=0.7×4Ne and (d) t123=0.9×4Ne generations ago. The expected AUC value of a random classifier is AUC = 0.5

Detecting selection with a computed k. The kNN methods with pairwise F ST as features, compared with F ST,
pcadapt, F LK and
blockfest. The recombination rate is r = 0.001, and the number of SNPs per region is 50. (a and b). Varying the coalescent time with population P 3 (t123=[0.1,0.3,0.5,0.7,0.9]×4Ne generations ago). The realized mean F ST over all regions is FST=[0.17,0.31,0.42,0.50,0.55]. (c and d). Varying the recombination rate (r=[0.001,0.005,0.01,0.05]). The coalescent time with population P 3 is t123=0.7×4Ne generations ago. The realized mean F ST over all regions is FST=[0.31,0.32,0.32,0.31]. The expected value of a random classifier is AUC = 0.5 and PR‐AUC = 50/1,000=0.05

Varying the fraction of introgression (f). The result for the kNN‐based methods using pairwise F ST as features, for 100 sequentially sampled k values. Coalescent times are t12=0.1×4Ne and t123=0.9×4Ne generations ago. Recent introgression is introduced tGF=0.01×4Ne generations ago, and the recombination rate is set to r=0.01 in all simulations. The outcome of the kNN‐based methods is compared to F ST, D 3 and |D 3|. The fraction of introgression is (a) f = 0.9, (b) f = 0.7, (c) f = 0.5 and (d) f = 0.3. The expected AUC value of a random classifier is AUC = 0.5

Detecting introgression with a computed k. The accuracy of the kNN methods with pairwise F ST as features, compared with F ST, D 3 and |D 3|. The recombination rate is r = 0.01 in all simulations. (a, b). Varying the fraction of introgression (f=[0.9,0.7,0.5,0.3]). Coalescent times are t12=0.1×4Ne and t123=0.9×4Ne generations ago, and recent introgression is introduced tGF=0.01×4Ne generations ago. The realized mean F ST over all regions is FST=[0.50,0.50,0.50,0.50]. (c, d). Varying the time of gene flow (tGF=[0.1,0.3,0.5,0.8]×4Ne) with an fixed fraction of introgression of f = 0.7. Coalescent times are t12=1×4Ne and t123=2×4Ne generations ago. The realized mean F ST over all regions is FST=[0.75,0.75,0.75,0.75]. The expected value of a random classifier is AUC = 0.5 and PR‐AUC = 50/1,000 = 0.05

Genome‐scan plots of human chromosome 2. (a–h). The kNN scores are shown along human chromosome 2 based on 100‐kb consecutive sliding windows. Red and orange dots are outliers identified by the kNN methods (0.005‐quantile of the scores). Red dots indicate that all methods agree on these outliers and orange dots otherwise. (i). A diagnostic plot is shown with the pairwise rank correlations of the kNN scores while varying the parameter k
Similar articles
-
Estimates of introgression as a function of pairwise distances.
Pfeifer B, Kapan DD. Pfeifer B, et al. BMC Bioinformatics. 2019 Apr 23;20(1):207. doi: 10.1186/s12859-019-2747-z. BMC Bioinformatics. 2019. PMID: 31014244 Free PMC article.
-
A new method to scan genomes for introgression in a secondary contact model.
Geneva AJ, Muirhead CA, Kingan SB, Garrigan D. Geneva AJ, et al. PLoS One. 2015 Apr 14;10(4):e0118621. doi: 10.1371/journal.pone.0118621. eCollection 2015. PLoS One. 2015. PMID: 25874895 Free PMC article.
-
VolcanoFinder: Genomic scans for adaptive introgression.
Setter D, Mousset S, Cheng X, Nielsen R, DeGiorgio M, Hermisson J. Setter D, et al. PLoS Genet. 2020 Jun 18;16(6):e1008867. doi: 10.1371/journal.pgen.1008867. eCollection 2020 Jun. PLoS Genet. 2020. PMID: 32555579 Free PMC article.
-
Manel S, Perrier C, Pratlong M, Abi-Rached L, Paganini J, Pontarotti P, Aurelle D. Manel S, et al. Mol Ecol. 2016 Jan;25(1):170-84. doi: 10.1111/mec.13468. Epub 2015 Dec 17. Mol Ecol. 2016. PMID: 26562485 Review.
-
Controlling false discoveries in genome scans for selection.
François O, Martins H, Caye K, Schoville SD. François O, et al. Mol Ecol. 2016 Jan;25(2):454-69. doi: 10.1111/mec.13513. Epub 2016 Jan 18. Mol Ecol. 2016. PMID: 26671840 Review.
Cited by
-
Yin X, Martinez AS, Sepúlveda MS, Christie MR. Yin X, et al. BMC Genomics. 2021 Apr 14;22(1):269. doi: 10.1186/s12864-021-07553-x. BMC Genomics. 2021. PMID: 33853517 Free PMC article.
-
Demography and selection analysis of the incipient adaptive radiation of a Hawaiian woody species.
Izuno A, Onoda Y, Amada G, Kobayashi K, Mukai M, Isagi Y, Shimizu KK. Izuno A, et al. PLoS Genet. 2022 Jan 21;18(1):e1009987. doi: 10.1371/journal.pgen.1009987. eCollection 2022 Jan. PLoS Genet. 2022. PMID: 35061669 Free PMC article.
-
A novel hybrid algorithm based on Harris Hawks for tumor feature gene selection.
Liu J, Feng H, Tang Y, Zhang L, Qu C, Zeng X, Peng X. Liu J, et al. PeerJ Comput Sci. 2023 Feb 13;9:e1229. doi: 10.7717/peerj-cs.1229. eCollection 2023. PeerJ Comput Sci. 2023. PMID: 37346505 Free PMC article.
-
Ghost lineages can invalidate or even reverse findings regarding gene flow.
Tricou T, Tannier E, de Vienne DM. Tricou T, et al. PLoS Biol. 2022 Sep 14;20(9):e3001776. doi: 10.1371/journal.pbio.3001776. eCollection 2022 Sep. PLoS Biol. 2022. PMID: 36103518 Free PMC article.
-
Ma LJ, Cao LJ, Chen JC, Tang MQ, Song W, Yang FY, Shen XJ, Ren YJ, Yang Q, Li H, Hoffmann AA, Wei SJ. Ma LJ, et al. Mol Biol Evol. 2024 Mar 1;41(3):msae044. doi: 10.1093/molbev/msae044. Mol Biol Evol. 2024. PMID: 38401527 Free PMC article.
References
-
- Angiulli, F. , & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces In European Conference on Principles of Data Mining and Knowledge Discovery. Proceedings, 15–27 Springer.
-
- Angiulli, F. , & Pizzuti, C. (2005). Outlier mining in large high‐dimensional data sets. IEEE Transactions on Knowledge and Data Engineering, 17, 203–215. 10.1109/TKDE.2005.31 - DOI
-
- Bache, K. , & Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California; School of information and Computer Science, 28. Retrieved from http://archive.ics.uci.edu/ml
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous