pubmed.ncbi.nlm.nih.gov

Bacterial phylogenetic tree construction based on genomic translation stop signals - PubMed

  • ️Sun Jan 01 2012

Bacterial phylogenetic tree construction based on genomic translation stop signals

Lijing Xu et al. Microb Inform Exp. 2012.

Abstract

Background: The efficiencies of the stop codons TAA, TAG, and TGA in protein synthesis termination are not the same. These variations could allow many genes to be regulated. There are many similar nucleotide trimers found on the second and third reading-frames of a gene. They are called premature stop codons (PSC). Like stop codons, the PSC in bacterial genomes are also highly bias in terms of their quantities and qualities on the genes. Phylogenetically related species often share a similar PSC profile. We want to know whether the selective forces that influence the stop codons and the PSC usage biases in a genome are related. We also wish to know how strong these trimers in a genome are related to the natural history of the bacterium. Knowing these relations may provide better knowledge in the phylogeny of bacteria

Results: A 16SrRNA-alignment tree of 19 well-studied α-, β- and γ-Proteobacteria Type species is used as standard reference for bacterial phylogeny. The genomes of sixty-one bacteria, belonging to the α-, β- and γ-Proteobacteria subphyla, are used for this study. The stop codons and PSC are collectively termed "Translation Stop Signals" (TSS). A gene is represented by nine scalars corresponding to the numbers of counts of TAA, TAG, and TGA on each of the three reading-frames of that gene. "Translation Stop Signals Ratio" (TSSR) is the ratio between the TSS counts. Four types of TSSR are investigated. The TSSR-1, TSSR-2 and TSSR-3 are each a 3-scalar series corresponding respectively to the average ratio of TAA: TAG: TGA on the first, second, and third reading-frames of all genes in a genome. The Genomic-TSSR is a 9-scalar series representing the ratio of distribution of all TSS on the three reading-frames of all genes in a genome. Results show that bacteria grouped by their similarities based on TSSR-1, TSSR-2, or TSSR-3 values could only partially resolve the phylogeny of the species. However, grouping bacteria based on thier Genomic-TSSR values resulted in clusters of bacteria identical to those bacterial clusters of the reference tree. Unlike the 16SrRNA method, the Genomic-TSSR tree is also able to separate closely related species/strains at high resolution. Species and strains separated by the Genomic-TSSR grouping method are often in good agreement with those classified by other taxonomic methods. Correspondence analysis of individual genes shows that most genes in a bacterial genome share a similar TSSR value. However, within a chromosome, the Genic-TSSR values of genes near the replication origin region (Ori) are more similar to each other than those genes near the terminus region (Ter).

Conclusion: The translation stop signals on the three reading-frames of the genes on a bacterial genome are interrelated, possibly due to frequent off-frame recombination facilitated by translational-associated recombination (TSR). However, TSR may not occur randomly in a bacterial chromosome. Genes near the Ori region are often highly expressed and a bacterium always maintains multiple copies of Ori. Frequent collisions between DNA- polymerase and RNA-polymerase would create many DNA strand-breaks on the genes; whereas DNA strand-break induced homologues-recombination is more likely to take place between genes with similar sequence. Thus, localized recombination could explain why the TSSR of genes near the Ori region are more similar to each other. The quantity and quality of these TSS in a genome strongly reflect the natural history of a bacterium. We propose that the Genomic- TSSR can be used as a subjective biomarker to represent the phyletic status of a bacterium.

PubMed Disclaimer

Figures

Figure 1
Figure 1

Species correlations based on reading-frame-specific translation stop signals. Hierarchical clustering of 61 bacteria (A) correlation based on the genomic translation stop signals ratios on the first reading frames (TSSR-1); (B) correlation based on the genomic translation stop signals ratios on the second reading frames (TSSR-2) and, (C) correlation based on the genomic translation stop signals ratios on third reading frames (TSSR-3). Correlation distance is between zero and one with zero being 100% similar, and one being no correlation.

Figure 2
Figure 2

16S rRNA alignment reference tree. A phylogenetic reference tree is constructed from the 16SrRNA sequence alignment with 19 type species (see Table 1). This standard tree was used to validate the accuracy of other trees using bacterial translation stop signals profiles.

Figure 3
Figure 3

Species correlation based on genomic translation stop signals on all three reading-frames. Distance correlation of 61 bacteria based on their Genomic Translation Stop Signals Ratio. A species is represented by the average value of all its Genic-TSSR (Genomic-TSSR). The Genomic-TSSR values of 61 bacterial genomes were clustered by Hierarchical clustering by City-Block Distance, Complete-Linage. Parentheses show the genomic size and GC ratio of that species. Correlation distance is between zero and one with zero being 100% similar, and one being no correlation.

Figure 4
Figure 4

Correspondence Analysis of individual genes from four different species. Five hundred randomly selected genes from each of the genomes of four different bacteria were selected for CA analysis. At 95% confidence, four clusters of genes could be recognized: Escherichia coli CFT073 (ECOL, solid line, Turquoise), Salmonella typhimurium LTS (SALM, dashed line, Green), Rickettsia typhi Wilmington (RICK, dotted-dashed line, Dark Blue) and Neisseria meningitidis Mc58 (NEIS, dotted line, Red). Also showed are the centroids of the genes of E. coli (E), S. typhimurium (S), R. typhi (R), and N. meningitidis (N).

Figure 5
Figure 5

Kolmogorov-Smirnov test for discrete distributions of Genic-TSSRs on a chromosome. A two-sided Kolmogorov-Smirnov test (KS-test) comparing the average percentage of counts of each translation stop signals among 400 genes from Escherichia coli K12 based on assigning these genes to three categories. (A) genes are assigned based on the location on each replichore (Left (∆) vs. Right (▼)); (B) genes are assigned based on their orientation on the leading or lagging stands of the DNA (Forward (■) vs. Reverse (□)), or (C) genes are assigned based on their proximity to the replication origin or terminus (Ori (●) vs. Ter (○)). The insert shows the standard deviation of the average percentage of counts of all nine types of translation stop signals in each gene assignment.

Similar articles

Cited by

References

    1. Nevo E. Evolution of genome-phenome diversity under environmental stress. Proc Natl Acad Sci U S A. 2001;98:6233–6240. doi: 10.1073/pnas.101109298. - DOI - PMC - PubMed
    1. Hershberg R, Petrov DA. Selection on codon bias. Annu Rev Genet. 2008;42:287–299. doi: 10.1146/annurev.genet.42.110807.091442. - DOI - PubMed
    1. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. doi: 10.1146/annurev.genet.39.073003.112420. - DOI - PubMed
    1. Gupta RS. The branching order and phylogenetic placement of species from completed bacterial genomes, based on conserved indels found in various proteins. Int Microbiol. 2001;4:187–202. doi: 10.1007/s10123-001-0037-9. - DOI - PubMed
    1. Ermolaeva MD. Synonymous codon usage in bacteria. Curr Issues Mol Biol. 2001;3:91–97. - PubMed

LinkOut - more resources