pubmed.ncbi.nlm.nih.gov

From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria - PubMed

From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria

Emmanuelle Lerat et al. PLoS Biol. 2003 Oct.

Abstract

The rapid increase in published genomic sequences for bacteria presents the first opportunity to reconstruct evolutionary events on the scale of entire genomes. However, extensive lateral gene transfer (LGT) may thwart this goal by preventing the establishment of organismal relationships based on individual gene phylogenies. The group for which cases of LGT are most frequently documented and for which the greatest density of complete genome sequences is available is the gamma-Proteobacteria, an ecologically diverse and ancient group including free-living species as well as pathogens and intracellular symbionts of plants and animals. We propose an approach to multigene phylogeny using complete genomes and apply it to the case of the gamma-Proteobacteria. We first applied stringent criteria to identify a set of likely gene orthologs and then tested the compatibilities of the resulting protein alignments with several phylogenetic hypotheses. Our results demonstrate phylogenetic concordance among virtually all (203 of 205) of the selected gene families, with each of the exceptions consistent with a single LGT event. The concatenated sequences of the concordant families yield a fully resolved phylogeny. This topology also received strong support in analyses aimed at excluding effects of heterogeneity in nucleotide base composition across lineages. Our analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT. This gene set can be identified and used to yield robust hypotheses for organismal phylogenies, thus establishing a foundation for reconstructing the evolutionary transitions, such as gene transfer, that underlie diversity in genome content and organization.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no conflicts of interest exist.

Figures

Figure 1
Figure 1. Number of Genes and Species within the Gene Families

(A) Distribution of the number of genes contained in the homolog families. (B) Number of orphan genes in each species in parentheses. Abbreviations: Ba, Buchnera aphidicola; Ec, Escherichia coli; Hi, Haemophilus influenzae; Pa, Pseudomonas aeruginosa; Pm, Pasteurella multocida; St, Salmonella typhimurium; Vc, Vibrio cholerae; Wb, Wigglesworthia brevipalpis; Xa, Xanthomonas axonopodis; Xc, Xanthomonas campestris; Xf, Xylella fastidiosa; Yp CO92, Yersinia pestis CO_92; Yp KIM, Yersinia pestis KIM. (C) Distribution of the number of species contained in the homolog families.

Figure 2
Figure 2. The 13 Candidate Topologies

Topologies 1–4 correspond to tree reconstructions based on SSU rRNA. Topologies 5 and 6 correspond to the trees based on the concatenation of the proteins. Topologies 7–13 correspond to additional topologies constructed to test the sister relationship of the two symbiont species. Species abbreviations as in Figure 1. Abbreviations: ML, maximum likelihood; NJ, neighbor joining; K, Kimura distance; G&G, Galtier and Gouy distance; γ, gamma-based method for correcting the rate heterogeneity among sites. The position of the root corresponds to the one obtained repeatedly using SSU rRNA.

Figure 3
Figure 3. Result of the SH Test

The graph shows the number of alignments accepting or rejecting each topology. The “Other Topologies” are those built to test the sister relationship of Wigglesworthia and Buchnera. The “Proteins” topologies are those obtained using both the protein concatenation and the consensus of trees from all 205 alignments. The “SSU rRNA” topologies were obtained using the SSU rRNA sequences with different methods.

Figure 4
Figure 4. Phylogenies for the Laterally Transferred Genes

(A) ML trees obtained for BioB (left) and MviN (right). (B) NJ trees obtained for BioB (left) and MviN (right). Abbreviations: Pf, Pseudomonas fluorescens; Pp, Pseudomonas putida; Ps, Pseudomonas syringae. Other species abbreviations as in Figure 1.

Figure 5
Figure 5. Tree Based on the Concatenation of the 205 Proteins (NJ)

The topology shown agrees with almost all individual gene alignments (topology 5 of Figure 2). The same tree is obtained after removing the two genes showing evidence for LGT. The position of the root corresponds to the one obtained repeatedly using SSU rRNA.

Figure 6
Figure 6. Similarity Levels for Pairwise Comparisons of Genes from Two Representative Genome Pairs

Frequency distribution of the ratio (bit score/maximal bit score) in a BLASTP query of the proteins from E. coli on the proteins from the genomes of Salmonella enterica (solid line) and Vibrio cholerae (dashed line). The ratio of 0.3 allows identification of most homologs but excludes probable nonspecific matches (NS).

Similar articles

Cited by

References

    1. Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, et al. Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia . Nat Genet. 2002;32:402–407. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, et al. GenBank. Nucleic Acids Res. 2002;30:17–20. - PMC - PubMed
    1. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–1474. - PubMed
    1. Brochier C, Philippe H, Moreira D. The evolutionary history of ribosomal protein RpS14: Horizontal gene transfer at the heart of the ribosome. Trends Genet. 2000;16:529–533. - PubMed

Publication types

MeSH terms

LinkOut - more resources