pubmed.ncbi.nlm.nih.gov

Orthologous gene clusters and taxon signature genes for viruses of prokaryotes - PubMed

Orthologous gene clusters and taxon signature genes for viruses of prokaryotes

David M Kristensen et al. J Bacteriol. 2013 Mar.

Abstract

Viruses are the most abundant biological entities on earth and encompass a vast amount of genetic diversity. The recent rapid increase in the number of sequenced viral genomes has created unprecedented opportunities for gaining new insight into the structure and evolution of the virosphere. Here, we present an update of the phage orthologous groups (POGs), a collection of 4,542 clusters of orthologous genes from bacteriophages that now also includes viruses infecting archaea and encompasses more than 1,000 distinct virus genomes. Analysis of this expanded data set shows that the number of POGs keeps growing without saturation and that a substantial majority of the POGs remain specific to viruses, lacking homologues in prokaryotic cells, outside known proviruses. Thus, the great majority of virus genes apparently remains to be discovered. A complementary observation is that numerous viral genomes remain poorly, if at all, covered by POGs. The genome coverage by POGs is expected to increase as more genomes are sequenced. Taxon-specific, single-copy signature genes that are not observed in prokaryotic genomes outside detected proviruses were identified for two-thirds of the 57 taxa (those with genomes available from at least 3 distinct viruses), with half of these present in all members of the respective taxon. These signatures can be used to specifically identify the presence and quantify the abundance of viruses from particular taxa in metagenomic samples and thus gain new insights into the ecology and evolution of viruses in relation to their hosts.

PubMed Disclaimer

Figures

Fig 1
Fig 1

Proportions of prokaryotic virus types (dsDNA phages, dsDNA archaeal viruses, ssDNA, ssRNA, or dsRNA) in the data set and distribution of the number of protein-coding genes in virus genomes. The inset shows in more detail the part of the distribution that includes small virus genomes with <20 protein-coding genes.

Fig 2
Fig 2

Functions and sizes of the 20 largest POGs. When the number of proteins (dark blue) is greater than the number of organisms (light blue), the excess is due to paralogy.

Fig 3
Fig 3

Distribution of the number of organisms in POGs, with the inset using a log scale on the y axis. The color scheme is the same as that for Fig. 1.

Fig 4
Fig 4

Network of phage genomes. The genomes of each phage are represented as boxes, which are colored according to the indicated taxonomic affiliation (type of dsDNA and with bacteria as their host except where specified otherwise), with connections drawn between genomes that share at least one POG. The distances between genomes are inversely proportional to the number of genes shared between neighbors. The inset is a zoomed-in region of the tightly connected subnetwork among the tailed phages.

Fig 5
Fig 5

Distribution of the frequency of POGs with the indicated range of VQ. The inset shows the y axis on a log scale. The color scheme is the same as that for Fig. 1.

Fig 6
Fig 6

Number and percentage of taxa that can be represented by at least one signature gene, with precision fixed at 100% and recall (x axis) allowed to vary. (a) The dependence of signatures on VQ value. (b) Breakdown of signatures into taxonomic levels.

Similar articles

Cited by

References

    1. Breitbart M, Rohwer F. 2005. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 13(6):278–284 - PubMed
    1. Wommack KE, Colwell RR. 2000. Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 64(1):69–114 - PMC - PubMed
    1. Suttle CA. 2007. Marine viruses–major players in the global ecosystem. Nat. Rev. Microbiol. 5(10):801–812 - PubMed
    1. Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, Rohwer F, Gordon JI. 2010. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466(7304):334–338 - PMC - PubMed
    1. Ansorge WJ. 2009. Next-generation DNA sequencing techniques. Nat. Biotechnol. 25(4):195–203 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources