pubmed.ncbi.nlm.nih.gov

Global landscape of recent inferred Darwinian selection for Homo sapiens - PubMed

  • ️Sun Jan 01 2006

Comparative Study

. 2006 Jan 3;103(1):135-40.

doi: 10.1073/pnas.0509691102. Epub 2005 Dec 21.

Affiliations

Comparative Study

Global landscape of recent inferred Darwinian selection for Homo sapiens

Eric T Wang et al. Proc Natl Acad Sci U S A. 2006.

Abstract

By using the 1.6 million single-nucleotide polymorphism (SNP) genotype data set from Perlegen Sciences [Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, D. G., Frazer, K. A. & Cox, D. R. (2005) Science 307, 1072-1079], a probabilistic search for the landscape exhibited by positive Darwinian selection was conducted. By sorting each high-frequency allele by homozygosity, we search for the expected decay of adjacent SNP linkage disequilibrium (LD) at recently selected alleles, eliminating the need for inferring haplotype. We designate this approach the LD decay (LDD) test. By these criteria, 1.6% of Perlegen SNPs were found to exhibit the genetic architecture of selection. These results were confirmed on an independently generated data set of 1.0 million SNP genotypes (International Human Haplotype Map Phase I freeze). Simulation studies indicate that the LDD test, at the megabase scale used, effectively distinguishes selection from other causes of extensive LD, such as inversions, population bottlenecks, and admixture. The approximately 1,800 genes identified by the LDD test were clustered according to Gene Ontology (GO) categories. Based on overrepresentation analysis, several predominant biological themes are common in these selected alleles, including host-pathogen interactions, reproduction, DNA metabolism/cell cycle, protein metabolism, and neuronal function.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.

LD patterns surrounding DRD4 7R and G6PD V202M. The observed FRC, associated with a minor allele under selection (DRD4 7R and G6PD V202M), are plotted vs. distance. FRC is calculated assuming the selected variant arose on a single chromosome (haplotype) (8). The indicated logistic function curves are approximated as sigmoidal, indicating the increasing decay of LD with distance with maximum assumed value of 0.5. Only sites in one direction from the selected allele are shown. The proximal region of the DRD4 7R data are shown at increased resolution in Inset. The approximate current Perlegen (1) data set detection limit (gray) is indicated.

Fig. 2.
Fig. 2.

Probabilistic method for finding unusual genetic architectures. (A) Binning on major/minor alleles. Each individual is sorted based on homozygosity at the major or minor allele at site S (arrowhead). (B) Compute fraction of adjacent recombinant chromosomes. The distance (d1–d3) and FRC for each neighboring SNP is then computed and stored. This list is then used to compute the ALnLH for each site (see text). Using only homozygous individuals for the computation eliminates the need to infer haplotypes.

Fig. 3.
Fig. 3.

Darwin's fingerprint. The global landscape (black lines) of recent inferred Darwinian selection for the Perlegen (PLG) and HapMap (CEU, CHB, JPT, and YRI) data sets is shown, aligned along chromosomes and genes (blue lines). A larger version of this figure is available as Fig. 9, and higher-resolution analysis can be obtained from the authors for display on the University of California at Santa Cruz Genome Browser (28).

Fig. 4.
Fig. 4.

Example of inferred selection at the Reticulon gene (RTN1), which encodes a neuroendocrine-specific protein thought to affect the formation of amyloid plaques in Alzheimer's disease (29, 30). (A) Inferred selected SNPs in the promoter region (red) are shown along with all annotated SNPs (black). (BD) The randomness for neighboring recombinant chromosomes for the major RTN1 allele (blue) at this site exemplifies the genome average, with little long-range LD. In contrast, the minor RTN1 allele (yellow) at this site closely matches the LDD model for selection. The horizontal axis labels distance away from each centered SNP, and the vertical axis is FRC (Fig. 1). (B) Perlegen data set. (C) CEU HapMap data set. (D) African ancestry (YRI) HapMap data set. The Asian HapMap data sets resemble the CEU architecture (data not shown). Note the twofold horizontal axis scale change for the YRI display, reflecting the more rapid LDD at this site in this population.

Fig. 5.
Fig. 5.

Overrepresented GO categories are not random and represent six biological themes. A total of 407 HapMap CEU selected genes are classifiable under Biological Process GO categories. For these classified genes, 870 biological themes with positive

ease

values were identified, as indicated. Six functional categories constitute 82% of the –log(

ease

) scores of >0.65, indicated by colored flags. Each flag is color-coded for one of these specific categories, namely pathogen–host interaction, reproduction, DNA metabolism (including putative transcription factors), cell cycle, protein metabolism, and neuronal function.

Similar articles

Cited by

References

    1. Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, D. G., Frazer, K. A. & Cox, D. R. (2005) Science 307, 1072–1079. - PubMed
    1. The International HapMap Consortium (2005) Nature 437, 1299–1320. - PMC - PubMed
    1. Risch, N. & Merikangas, K. (1996) Science 273, 1516–1517. - PubMed
    1. Zwick, M. E., Cutler, D. J. & Chakravarti, A. (2000) Annu. Rev. Genomics Hum. Genet. 1, 387–407. - PubMed
    1. Reich, D. E. & Lander, E. S. (2001) Trends Genet. 17, 502–510. - PubMed

Publication types

MeSH terms