PopGenome: an efficient Swiss army knife for population genomic analyses in R - PubMed
PopGenome: an efficient Swiss army knife for population genomic analyses in R
Bastian Pfeifer et al. Mol Biol Evol. 2014 Jul.
Abstract
Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN (http://cran.r-project.org/) for all major operating systems under the GNU General Public License.
Keywords: population genomics; single-nucleotide polymorphisms; software.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Figures

Diversity statistics for Arabidopsis thaliana chromosome 1. Data from the 1001 genomes project website (1001genomes.org) was analyzed in consecutive 10-kb windows. (A) Nucleotide diversity, (B) haplotype diversity, (C) fixation index (Hudson’s FST), contrasting one population against all other individuals. Each line corresponds to one population (see legend in panel [A]). Lines were smoothed using spline interpolation. The black bars around 15-Mb mask the centromere.

Tajima’s D calculated across nonsynonymous coding sites of exons in the human MHC region on chromosome 6. Each data point in (A) and (B) represents one exon; HLA type I and type II exons are shown in red. (A) Tajima’s D of a Tuscan population (117 individuals), plotted along chr. 6. (B) Comparison of Tajima’s D between a Tuscan (117 individuals) and a Yoruba (229 individuals) population. (C) Distribution (density curves) of the Tajima’s D values in (A) for MHC (red) and non-MHC exons (black). The blue curve displays the distribution of neutral values from coalescent simulations with Hudson’s MS based on all SNPs in the MHC region. Data from 1000genomes.org.

Comparison of PopGenome with existing software for population genetics and population genomics analyses. Symbols reflect the breadth of the implemented functionalities: ++, broad; +, limited; −, nonexistent. Details on the criteria used for assignment to the breadth classes are given in
supplementary table S1,
Supplementary Materialonline.
Similar articles
-
Estimates of introgression as a function of pairwise distances.
Pfeifer B, Kapan DD. Pfeifer B, et al. BMC Bioinformatics. 2019 Apr 23;20(1):207. doi: 10.1186/s12859-019-2747-z. BMC Bioinformatics. 2019. PMID: 31014244 Free PMC article.
-
BlockFeST: Bayesian calculation of region-specific FST to detect local adaptation.
Pfeifer B, Lercher MJ. Pfeifer B, et al. Bioinformatics. 2018 Sep 15;34(18):3205-3207. doi: 10.1093/bioinformatics/bty299. Bioinformatics. 2018. PMID: 29718170
-
WhopGenome: high-speed access to whole-genome variation and sequence data in R.
Wittelsbürger U, Pfeifer B, Lercher MJ. Wittelsbürger U, et al. Bioinformatics. 2015 Feb 1;31(3):413-5. doi: 10.1093/bioinformatics/btu636. Epub 2014 Oct 1. Bioinformatics. 2015. PMID: 25273104
-
Introgression browser: high-throughput whole-genome SNP visualization.
Aflitos SA, Sanchez-Perez G, de Ridder D, Fransz P, Schranz ME, de Jong H, Peters SA. Aflitos SA, et al. Plant J. 2015 Apr;82(1):174-82. doi: 10.1111/tpj.12800. Plant J. 2015. PMID: 25704554
-
Gruber B, Unmack PJ, Berry OF, Georges A. Gruber B, et al. Mol Ecol Resour. 2018 May;18(3):691-699. doi: 10.1111/1755-0998.12745. Epub 2018 Jan 15. Mol Ecol Resour. 2018. PMID: 29266847
Cited by
-
Molecular evolution and the decline of purifying selection with age.
Cheng C, Kirkpatrick M. Cheng C, et al. Nat Commun. 2021 May 11;12(1):2657. doi: 10.1038/s41467-021-22981-9. Nat Commun. 2021. PMID: 33976227 Free PMC article.
-
Einspanier S, Susanto T, Metz N, Wolters PJ, Vleeshouwers VGAA, Lankinen Å, Liljeroth E, Landschoot S, Ivanović Ž, Hückelhoven R, Hausladen H, Stam R. Einspanier S, et al. Evol Appl. 2022 Feb 22;15(10):1605-1620. doi: 10.1111/eva.13350. eCollection 2022 Oct. Evol Appl. 2022. PMID: 36330303 Free PMC article.
-
Campbell BC, Gilding EK, Mace ES, Tai S, Tao Y, Prentis PJ, Thomelin P, Jordan DR, Godwin ID. Campbell BC, et al. Plant Biotechnol J. 2016 Dec;14(12):2240-2253. doi: 10.1111/pbi.12578. Epub 2016 Jun 11. Plant Biotechnol J. 2016. PMID: 27155090 Free PMC article.
-
Sex-Specific Selection and Sex-Biased Gene Expression in Humans and Flies.
Cheng C, Kirkpatrick M. Cheng C, et al. PLoS Genet. 2016 Sep 22;12(9):e1006170. doi: 10.1371/journal.pgen.1006170. eCollection 2016 Sep. PLoS Genet. 2016. PMID: 27658217 Free PMC article.
-
Pokharel K, Weldenegodguad M, Dudeck S, Honkatukia M, Lindeberg H, Mazzullo N, Paasivaara A, Peippo J, Soppela P, Stammler F, Kantanen J. Pokharel K, et al. Sci Rep. 2023 Dec 27;13(1):23019. doi: 10.1038/s41598-023-50253-7. Sci Rep. 2023. PMID: 38155192 Free PMC article.
References
-
- Adler D, Gläser C, Nenadic O, Oehlschlägel J, Zucchini W. 2013. ff: memory-efficient storage of large data on disk and fast access functions [R package version 2.2-11]. [cited 2013 Dec] Available from: http://CRAN.R-project.org/package=ff.
-
- Cai J. PGEToolbox: a Matlab toolbox for population genetics and evolution. J Hered. 2008;99(4):438–440. - PubMed
-
- Cao J, Schneeberger K, Ossowski S, Günther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011;43(10):956–963. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources