pubmed.ncbi.nlm.nih.gov

Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes - PubMed

  • ️Fri Jan 01 2010

Comparative Study

Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes

James J Cai et al. Genome Biol Evol. 2010.

Abstract

Genes in the same organism vary in the time since their evolutionary origin. Without horizontal gene transfer, young genes are necessarily restricted to a few closely related species, whereas old genes can be broadly distributed across the phylogeny. It has been shown that young genes evolve faster than old genes; however, the evolutionary forces responsible for this pattern remain obscure. Here, we classify human-chimp protein-coding genes into different age classes, according to the breath of their phylogenetic distribution. We estimate the strength of purifying selection and the rate of adaptive selection for genes in different age classes. We find that older genes carry fewer and less frequent nonsynonymous single-nucleotide polymorphisms than younger genes suggesting that older genes experience a stronger purifying selection at the protein-coding level. We infer the distribution of fitness effects of new deleterious mutations and find that older genes have proportionally more slightly deleterious mutations and fewer nearly neutral mutations than younger genes. To investigate the role of adaptive selection of genes in different age classes, we determine the selection coefficient (gamma = 2N(e)s) of genes using the MKPRF approach and estimate the ratio of the rate of adaptive nonsynonymous substitution to synonymous substitution (omega(A)) using the DoFE method. Although the proportion of positively selected genes (gamma > 0) is significantly higher in younger genes, we find no correlation between omega(A) and gene age. Collectively, these results provide strong evidence that younger genes are subject to weaker purifying selection and more tenuous evidence that they also undergo adaptive evolution more frequently.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—

Protein divergence rates (Ka and Ka/KS) as a function of LS. (A) Phylogenetic profiles of human protein-coding genes in ten LS groups. Solid circles formula image and open circles formula image indicate the presence and absence of human genes in the corresponding species, respectively. Genes that are present in all 11 species (i.e., LS 1 genes) show the profile like formula image (vertically arranged); genes that are present in human and chimpanzee and absent in the rest species (i.e., LS 10 genes) have the profile like formula image (vertically arranged). LS levels are labeled with circled numbers. Genes whose phylogenetic profiles do not match any of the ten given profiles were excluded from the analysis; otherwise, they (such as, those with a profile like formula image) were excluded from the analysis. The numbers of genes in LS groups are given in the parentheses. (B) Medians of divergence rates (pooled Ka, KS, and Ka/KS derived from the Applera divergence data [Bustamante et al. 2005]) for ten LS groups. Error bars indicate 95% CIs calculated from the 10,000 bootstrap replications. For individual genes, the Ka and Ka/KS values vary widely and significantly among different LS groups (χ2 = 2024.91 and 1926.15, respectively, degrees of freedom [df] = 9, P << 0.001 in both cases, KW test). The difference in KS is much less substantial albeit significant among LS groups (χ2 = 39.17, df = 9, P = 1.07 × 10−5, KW test). Ka and Ka/KS are positively correlated with the LS values (Spearman’s ρ = 0.507 and 0.503, respectively, P << 0.001 in both cases), whereas Ks shows no such correlation (Spearman’s ρ = 0.016, P = 0.182).

F<sc>IG</sc>. 2.—
FIG. 2.—

Protein divergence rates (Ka and Ka/KS) as a function of PL. (A) Assignment of original phylostrata (obtained from [Domazet-Loso and Tautz 2008]) into nine PL groups. (B) Median values of divergence rates (pooled Ka, KS, and Ka/KS derived from the Applera divergence data [Bustamante et al. 2005]) for nine PL groups. Error bars indicate 95% CIs calculated from the 10,000 bootstrap replications. For individual genes, the Ka and Ka/KS values vary widely and significantly among different PL groups (χ2 = 1177.36 and 1120.65, respectively, degrees of freedom [df] = 8, P << 0.001 in both cases, KW test). The difference in KS is much less substantial albeit significant among PL groups (χ2 = 126.23, df = 8, P << 0.001, KW test). Both Ka and Ka/KS are positively correlated with PL (Spearman’s ρ = 0.215 and 0.206, respectively, P << 0.001 in both cases), whereas KS shows much weaker correlation (Spearman’s ρ = 0.064, P = 1.19 × 10−13).

F<sc>IG</sc>. 3.—
FIG. 3.—

Protein divergence rates (Ka and Ka/KS) as a function of number of GL. (A) Phylogenetic profiles of human genes that are present in human, chimpanzee, and yeast but vary in their presence and absence intermediate species (such as, mouse, cow, and chicken). Same notation is used as in figure 1. The number of GLs is counted in species between human and yeast. (B) Median values of divergence rates (pooled Ka, KS, and Ka/KS derived from the Applera divergence data [Bustamante et al. 2005]) for groups of genes whose loss counts are 0, 1, 2, and ≥3. Error bars indicate 95% CIs calculated from the 10,000 bootstrap replications. For individual genes, the Ka and Ka/KS values vary marginally significantly among different GL groups (χ2 = 16.04 and 15.85, respectively, degrees of freedom [df] = 3, P = 0.001 in both cases, KW test). The difference in KS is not significant among GL groups (χ2 = 0.55, df = 3, P = 0.908, KW test). Both Ka and Ka/KS are positively correlated with GL (Spearman’s ρ = 0.068 and 0.067, respectively, P < 0.001 in both cases), whereas KS shows no correlation (Spearman’s ρ = 0.001, P = 0.939).

F<sc>IG</sc>. 4.—
FIG. 4.—

Median Ka/KS as a function of LS and PL for slowly and fast-evolving genes. (A) Median Ka/KS for ten LS groups; (B) Median Ka/KS for nine PL groups. Genes are grouped into the slowly (Ka ≤ 0.007) and fast (Ka > 0.007)-evolving ones.

F<sc>IG</sc>. 5.—
FIG. 5.—

Polymorphism rates (A*, S*, and A*/S*) as a function of LS. Results are derived from three data sets. (A) Applera SNPs (Bustamante et al. 2005). Spearman’s ρ = 0.964 and 0.952 (both P < 0.001), for the correlation of LS levels with A* and A*/S*, respectively. (B) Validated SNPs in dbSNP 126. Spearman’s ρ = 0.803 and 0.891 (P < 0.001 and 0.005), for the correlation of LS values with A* and A*/S*, respectively. (C) Perlegen SNPs (Hinds et al. 2005). Spearman’s ρ = 0.952 and 0.830 (P < 0.001 and P = 0.006), for the correlation of LS values with A* and A*/S*, respectively. Error bars indicate 95% CIs calculated from the 10,000 bootstrap replications.

F<sc>IG</sc>. 6.—
FIG. 6.—

Portions of SNPs with low-frequency derived allele (DAF < 0.15) in genes of ten LS groups. Results derived from Applera data for both nSNPs and sSNPs are shown here.

F<sc>IG</sc>. 7.—
FIG. 7.—

Fractions of mutations in Nes range for genes in different LS classes.

F<sc>IG</sc>. 8.
FIG. 8.

Portions of genes under positive selection and negative selection as a function of LS level. A gene is considered to be under positive (or negative) selection if the mean posterior probability of γ is positive (or negative) and the 95% Bayesian credibility intervals do not overlap 0. The value of γ is estimated using MKPRF method with nonhierarchical model and a single Gaussian prior of γ with a mean of 0 and the SD of 8 (see Materials and Methods). Each row contains two panels: Left panel shows fγ>0 (red bars) and f-γ>0 (gray bars), fractions of genes with 95% CI of γ completely above 0 and right panel shows fγ<0 (red bars) and f-γ<0 (gray bars), fractions of genes with 95% CI of γ completely below 0. The MKPRF analysis was run with the nonhierarchical model and a SD (σ = 8) of Gaussian prior of γ, replicating the settings used by Bustamante et al. (2005). The results were derived from (A) all Applera SNPs (Spearman corr(LS, fγ>0) = 0.88, P = 0.0007; corr(LS, f-γ>0) = 0.89, P = 0.001; and corr(LS, fγ<0) = −0.91, P = 0.0005; corr(LS, f-γ<0) = −0.92, P = 0.005). (B) Applera SNPs with DAF ≥ 0.15 (Spearman corr(LS, fγ>0) = 0.93, P < 0.0001; corr(LS, f-γ>0) = 0.94, P < 0.0001; and corr(LS, fγ<0) = −0.66, P = 0.04; corr(LS, f-γ<0) = −0.85, P = 0.004). (C) Applera SNPs subsampled to ensure an equal portion of slightly deleterious polymorphism in all LS groups (Spearman corr(LS, fγ>0) = 0.94, P < 0.0001; corr(LS, f-γ>0) = 0.95, P < 0.0001; and corr(LS, fγ<0) = 0.56, P = 0.09; corr(LS, f-γ<0) = 0.37, P = 0.30).

Similar articles

Cited by

References

    1. Alba MM, Castresana J. Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol. 2005;22:598–606. - PubMed
    1. Alba MM, Castresana J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol Biol. 2007;7:53. - PMC - PubMed
    1. Al-Shahrour F, Diaz-Uriarte R, Dopazo J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004;20:578–580. - PubMed
    1. Barrier M, Bustamante CD, Yu J, Purugganan MD. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics. 2003;163:723–733. - PMC - PubMed
    1. Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008;36:D263–D266. - PMC - PubMed

Publication types

MeSH terms

Substances