pubmed.ncbi.nlm.nih.gov

From genus to phylum: large-subunit and internal transcribed spacer rRNA operon regions show similar classification accuracies influenced by database composition - PubMed

Comparative Study

From genus to phylum: large-subunit and internal transcribed spacer rRNA operon regions show similar classification accuracies influenced by database composition

Andrea Porras-Alfaro et al. Appl Environ Microbiol. 2014 Feb.

Abstract

We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5' section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.

PubMed Disclaimer

Figures

FIG 1
FIG 1

Primer locations in the ITS region, showing the variable ITS1 and ITS2 regions and sequence length in the ITS-LSU training set. Two fragment extraction methods were utilized. Fragment sizes are shown with arrows.

FIG 2
FIG 2

Classification accuracy at each taxonomic level using different rRNA gene regions for LOOCV testing with the naive Bayesian classifier (NBC) and BLASTN approaches. Numbers are percentages of correctly classified query sequences from the ITS-LSU database. The naive Bayesian classifier was trained by full-length sequences without a bootstrap cutoff.

FIG 3
FIG 3

Classification accuracy comparison for the ITS1, ITS2, and LSU regions using the naive Bayesian classifier without a bootstrap or with a 50% bootstrap cutoff. The y axis shows the percentages of LOOCV sequences that were accurately classified. (A) ITS1 region; (B) ITS2 region; (C) LSU region. Query sequences of three lengths were extracted using the primer-anchored method.

FIG 4
FIG 4

Percentage of sequences removed from the ITS-LSU training set after applying a 50% bootstrap cutoff with the naive Bayesian classifier. The y axis shows the percentage of sequence removed. Query sequences of three sizes were extracted using the primer-anchored method.

FIG 5
FIG 5

Comparison of the full lengths of ITS1, ITS2, and entire ITS sections using the naive Bayesian classifier (NBC) or BLASTN. The classifier was used with and without a 50% bootstrap cutoff.

FIG 6
FIG 6

Accuracy comparison using the ITS1 and ITS2 sections with the naive Bayesian classifier at its default setting and a 50% bootstrap cutoff for exact-length sequences of 100 bp (A), 150 bp (B), and 200 bp (C). The y axes show percentages of accuracy (lines) and of sequences removed (bars).

Similar articles

Cited by

References

    1. Blackwell M. 2011. The fungi: 1, 2, 3. 5.1 million species? Am. J. Bot. 98:426–438. 10.3732/ajb.1000298 - DOI - PubMed
    1. O'Brien H, Parrent J, Jackson J, Moncalvo J-M, Vilgalys R. 2005. Fungal community analysis by large-scale sequencing of environmental samples. Appl. Environ. Microbiol. 71:5544–5550. 10.1128/AEM.71.9.5544-5550.2005 - DOI - PMC - PubMed
    1. Jumpponen A, Jones K. 2009. Massively parallel 454 sequencing indicates hyperdiverse fungal communities in temperate Quercus macrocarpa phyllosphere. New Phytol. 184:438–448. 10.1111/j.1469-8137.2009.02990.x - DOI - PubMed
    1. Blaalid R, Carlsen T, Kumar S, Halvorsen R, Ugland KI, Fontana G, Kauserud H. 2012. Changes in the root-associated fungal communities along a primary succession gradient analysed by 454 pyrosequencing. Mol. Ecol. 21:1897–1908. 10.1111/j.1365-294X.2011.05214.x - DOI - PubMed
    1. Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, Bahram M, Bechem E, Chuyong G, Kõljalg U. 2010. 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytol. 188:291–301. 10.1111/j.1469-8137.2010.03373.x - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources