pubmed.ncbi.nlm.nih.gov

Experimental determination and system level analysis of essential genes in Escherichia coli MG1655 - PubMed

Experimental determination and system level analysis of essential genes in Escherichia coli MG1655

S Y Gerdes et al. J Bacteriol. 2003 Oct.

Abstract

Defining the gene products that play an essential role in an organism's functional repertoire is vital to understanding the system level organization of living cells. We used a genetic footprinting technique for a genome-wide assessment of genes required for robust aerobic growth of Escherichia coli in rich media. We identified 620 genes as essential and 3,126 genes as dispensable for growth under these conditions. Functional context analysis of these data allows individual functional assignments to be refined. Evolutionary context analysis demonstrates a significant tendency of essential E. coli genes to be preserved throughout the bacterial kingdom. Projection of these data over metabolic subsystems reveals topologic modules with essential and evolutionarily preserved enzymes with reduced capacity for error tolerance.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.

Distribution of transposon insertion densities, densities of essential genes, and ERIs along the E. coli chromosome. (A) Gray lines show the transposon insertion densities calculated as the number of transposition events per 100-kb sliding window over the entire E. coli MG1655 chromosome. Values indicated by the blue lines were computed in a similar manner, except that all chromosomal regions corresponding to essential and ambiguous genes were excluded from the calculations in order to reconstruct insert distribution prior to selective outgrowth (see also Materials and Methods). Gaps in the data (chromosomal regions where transposition events could not be detected due to technical reasons) are indicated by short vertical lines along the x axis. These regions were excluded from all analyses. Nucleotide positions of the E. coli genome sequence correspond to those in reference . The regions where the distributions of transposition events significantly deviate (P < 0.01) from a Poisson process are marked by horizontal green lines. oriC shows the origin of chromosomal replication, and dif denotes the dif locus within the replication termination area. (B) Distribution of essential genes along the E. coli chromosome, defined as a percentage of essential genes in the total number of genes within a 100-kb-long chromosomal region (calculated per sliding window as described above). The regions where the numbers of essential genes significantly deviate (P < 0.01) from values that could arise by chance are marked by horizontal green lines. (C) ERIs along the E. coli chromosome, defined as the average ERI for all genes within each 100-kb region. The ERI for a gene is defined as the fraction of organisms in a diverse set of 33 bacterial species which contain an ortholog of the gene in their genomes.

FIG. 2.
FIG. 2.

Essentiality of genes controlling amino acid biosynthesis in E. coli. (A) Functional overview of amino acid biosynthesis. Each block represents one or more pathways leading to production of a particular amino acid or its key intermediates (shown in smaller boxes). Within each block, stacked bars represent the gene products involved in the pathway (according to SWISS-PROT release of June 2002). Bars are colored according to gene essentiality (green, nonessential; red, essential; gray, undefined). (B) Detailed representation of the lysine biosynthetic pathway. Genes predicted in the ERGO database to be paralogs in this pathway are shown, in addition to genes whose roles in the biosynthesis of lysine have been experimentally verified (in bold).

FIG. 3.
FIG. 3.

Distribution of E. coli genes as a function of ERIs. (A) Total number of genes with an ERI above the threshold plotted versus the ERI threshold. Color coding within bars represents fractions of essential (red), nonessential (green), ambiguous (yellow), and missing (gray) genes for each incremental increase of ERI threshold (with 33 diverse genomes in the reference set). (B) Fractions of essential genes at different ERI values. The data were fitted with the following function: y = yo+aebx, where yo is 12.0 ± 0.9, a is 0.023 ± 0.019, and b is 7.8 ± 0.8 (dashed red line). The dotted line represents the fractions of essential genes for the whole genome. (The fractions plotted are defined as the number of essential genes versus the number of essential (E) and nonessential (N) genes. Unknown or ambiguous genes are not taken into account.)

FIG. 4.
FIG. 4.

Distribution of essential genes among functional categories as a function of ERI thresholds. Functional categories are color coded and specified by three-letter designations as in Table 2. Within every threshold group, each bar represents the fraction (percent plotted on y axis) of all categorized essential genes corresponding to the number of essential genes in a given category (x axis) with ERI values above the set threshold (z axis).

FIG. 5.
FIG. 5.

E. coli genes found to be essential and preserved in over 80% of diverse bacterial genomes (ERI > 0.8). These universal essential genes are grouped by functional categories (described in Table 2). NTP, nucleotide triphosphate; FMN, flavin mononucleotide; FAD, flavin adenine dinucleotide; CoA, coenzyme A; TCA, tricarboxylic acid cycle; PRPP, phosphoribosyl pyrophosphate.

FIG. 6.
FIG. 6.

The evolutionary retention and essentiality ratio of enzymes in the topologic modules of E. coli metabolism. The hierarchical tree derived from the topologic overlap matrix of E. coli metabolism that quantifies the relation between the various modules is shown, as previously described (28). The branches of the tree are color coded according to the fraction of essential enzymes (top panel) and the average ERI score of enzymes (bottom panel) catalyzing the biochemical reactions within a given topologic module. Red indicates a 100% essentiality/conservation ratio within a module. Note that essentiality is not uniformly distributed across all modules (branches), but we observe a few small modules with very high fractions of essential enzymes, while the majority of modules contain no or only a few essential enzymes. A similar segregation of modules with high evolutionary conservation is observed in the second panel, with their locations often correlating with those of the high essentiality modules. The predominant biochemical classes of substrates used to group the metabolites are shown. Polysacch., polysaccharide; disacch., disaccharide; monosacch., monosaccharide; met. sugar alc., metabolic sugar alcohols.

Similar articles

Cited by

References

    1. Akerley, B. J., E. J. Rubin, V. L. Novick, K. Amaya, N. Judson, and J. J. Mekalanos. 2002. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae Proc. Natl. Acad. Sci. USA 99:966-971. - PMC - PubMed
    1. Anderson, R. P., and J. R. Roth. 1978. Tandem chromosomal duplications in Salmonella typhimurium: fusion of histidine genes to novel promoters. J. Mol. Biol. 119:147-166. - PubMed
    1. Badarinarayana, V., P. W. Estep III, J. Shendure, J. Edwards, S. Tavazoie, F. Lam, and G. M. Church. 2001. Selection analyses of insertional mutants using subgenic-resolution arrays. Nat. Biotechnol. 19:1060-1065. - PubMed
    1. Blattner, F. R., G. Plunkett III, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1474. - PubMed
    1. Csete, M. E., and J. C. Doyle. 2002. Reverse engineering of biological complexity. Science 295:1664-1669. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources