pubmed.ncbi.nlm.nih.gov

A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation - PubMed

. 2016 Jul;25(13):3081-100.

doi: 10.1111/mec.13671. Epub 2016 Jun 6.

Affiliations

A haplotype method detects diverse scenarios of local adaptation from genomic sequence variation

Jeremy D Lange et al. Mol Ecol. 2016 Jul.

Abstract

Identifying genomic targets of population-specific positive selection is a major goal in several areas of basic and applied biology. However, it is unclear how often such selection should act on new mutations versus standing genetic variation or recurrent mutation, and furthermore, favoured alleles may either become fixed or remain variable in the population. Very few population genetic statistics are sensitive to all of these modes of selection. Here, we introduce and evaluate the Comparative Haplotype Identity statistic (χMD ), which assesses whether pairwise haplotype sharing at a locus in one population is unusually large compared with another population, relative to genomewide trends. Using simulations that emulate human and Drosophila genetic variation, we find that χMD is sensitive to a wide range of selection scenarios, and for some very challenging cases (e.g. partial soft sweeps), it outperforms other two-population statistics. We also find that, as with FST , our haplotype approach has the ability to detect surprisingly ancient selective sweeps. Particularly for the scenarios resembling human variation, we find that χMD outperforms other frequency- and haplotype-based statistics for soft and/or partial selective sweeps. Applying χMD and other between-population statistics to published population genomic data from D. melanogaster, we find both shared and unique genes and functional categories identified by each statistic. The broad utility and computational simplicity of χMD will make it an especially valuable tool in the search for genes targeted by local adaptation.

Keywords: haplotypes; natural selection; partial sweeps; selective sweeps; simulation; soft sweeps.

© 2016 John Wiley & Sons Ltd.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Power of each statistic for complete hard sweeps for high Ne (top) and low Ne cases (bottom). Note the difference in selection initiation times of the x axes.

Figure 2
Figure 2

Power of each statistic tested for partial hard sweeps, for high Ne (top) and low Ne cases (bottom).

Figure 3
Figure 3

For complete soft sweeps, the top two panels depict power of each tested statistic. The bottom two panels depict the number of unique adaptations of derived allele at the time of sampling to help distinguish the softness of the sweep (where a value close to 1 indicates mostly hard sweeps). Note the change in scale of x axes between the two Ne cases simulated (left and right).

Figure 4
Figure 4

Heat map depicting power for each statistic for partial soft sweeps. The key refers to powers ranging from 0 to 1. The x axis represents the number of copies of the beneficial allele in the population when the populations split. Note the change in x axes between the two Ne cases (starting frequency per 10,000 or per 1,000). The y axis represent the ending allele frequency at sampling.

Figure 5
Figure 5

Depicted here are power for scenarios with bottlenecks simulated. The left panels depict hard sweeps, with varying strengths of bottlenecks indicated on the x axis. The right panels depict a single bottleneck strength (0.05) with varying starting allele frequencies. Additional cases are summarized in Table S5.

Figure 6
Figure 6

Migration was simulated for a subset of scenarios. The high levels of migration that affected statistical performance were sufficient to prevent fixed differences at the target site. Allele frequencies at sampling for both populations are shown below each migration rate.

Figure 7
Figure 7

This heat map depicts power of the χMD statistic as a function of allele frequency threshold (minimum frequency of allele to be included in analysis) and the time (in coalescent units) since the initiation of a complete hard selective sweep. The exclusion of all but intermediate frequency alleles yields surprising power to detect very ancient sweeps.

Figure 8
Figure 8

Sample size effects on each statistic. Bottleneck strength in the high Ne case is 0.01 while in the low Ne case it is 0.025. Ending frequency of partial hard sweeps is 0.3. Starting allele frequency is 0.001 for the high Ne complete soft sweep is and 0.02 for the low Ne case.

Figure 9
Figure 9

For selected sweep scenarios, this heat map shows χMD power for differing window lengths and threshold proportions (the fraction of a window that must be identical between two haplotypes).

Figure 10
Figure 10

For a subset of sweep scenarios, this figure illustrates the decay of all three statistics’ power by distance (kilobases on the x axis). In the non-bottleneck complete hard sweep, the high Ne populations split at 0.5 time units in the past and selection (s = 0.001) began at 0.2 time units in the past. In the low Ne population, the populations split at 0.2 time units in the past and selection (s = 0.01) began immediately. The bottleneck strength in the high Ne case is 0.05 and the low Ne case is 0.1. In both cases of the partial hard sweep, the ending allele frequencies were 0.5. In the complete soft sweep cases for both populations, starting frequency was 0.001. In the partial soft sweep cases for the high Ne case, starting allele frequency was 0.0001 and ending allele frequency was 0.5. For the high Ne case, beneficial starting allele frequency was 0.001 and ended at 0.5.

Figure 11
Figure 11

The power of four single population statistics was calculated for an older complete hard sweep, a partial hard sweep, a complete soft sweep, and a partial soft sweep. Note that simulation parameters differ between the high Ne and low Ne cases (Materials and Methods).

Figure 12
Figure 12

The top outlier regions and flanking windows for the empirical analysis of χMD, XP-EHH, and FST are shown. Above, the χMD outlier resides within a transcript region of the insulin receptor gene (InR alternative transcripts are shown). Below, XP-EHH and FST reached their maxima in the same outlier region (at adjacent windows), within a cluster of cuticle-related genes.

Similar articles

Cited by

References

    1. Bonin A, Nicole F, Pompanon F, Miaud C, Taberlet P. Population Adaptive Index: a new method to help measure intraspecific genetic diversity and prioritize populations for conservation. Cons Biol. 2007;21:697–708. - PubMed
    1. Comeron JM, Ratnappan R, Bailin A. The many landscapes of recombination in Drosophila melanogaster. PLoS Genetics. 2012;8:e1002905. - PMC - PubMed
    1. Ewing G, Hermisson J. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2010;26:2064–2065. - PMC - PubMed
    1. Fariello MI, Boitard S, Naya H, SanCristobal M, Servin B. Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics. 2013;193:929–941. - PMC - PubMed
    1. Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155:1405–1413. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources