Dynamics and associations of microbial community types across the human body
- ️Invalid Date
. Author manuscript; available in PMC: 2015 May 15.
Published in final edited form as: Nature. 2014 Apr 16;509(7500):357–360. doi: 10.1038/nature13178
Abstract
A primary goal of the Human Microbiome Project (HMP) was to provide a reference collection of 16S rRNA gene sequences collected from sites across the human body that would allow microbiologists to better associate changes in the microbiome with changes in health 1. The HMP Consortium has reported the structure and function of the human microbiome in 300 healthy adults at 18 body sites from a single time point 2,3. Using additional data collected over the course of 12–18 months, we used Dirichlet multinomial mixture models 4 to partition the data into community types for each body site and made three important observations. First, there were strong associations between whether they had been breastfed as an infant, their gender, and their level of education with their community types at several body sites. Second, although the specific taxonomic compositions of the oral and gut microbiomes were different, the community types observed at these sites these sites were predictive of each other. Finally, over the course of the sampling period, the community types from sites within the oral cavity were the least stable, while those in the vagina and gut were the most stable. Our results demonstrate that even with the considerable intra- and inter-personal variation in the human microbiome, this variation can be partitioned into community types that are predictive of each other and are likely the result of life history characteristics. Understanding the diversity of community types and the mechanisms that result in an individual having a particular type or changing types, will allow us to use their community types to assess disease risk and to personalize therapies.
Building upon previous analysis of a healthy cohort of 300 individuals, we analyzed a 16S rRNA gene sequence dataset HMP Consortium 2,3. The final data release for this cohort provided 16S rRNA gene sequence data and clinical metadata (Extended Data Table 1) from 2 time points for each of 300 healthy individuals and from a third time point for 100 of the individuals at 15 body sites for men and 18 for women 5; the interval between samplings varied between 30 and 451 days (median=224 days). A significant difficulty in analyzing microbiome data has been the considerable intra- and interpersonal variation in the composition of the human microbiome 3,6,7. A recently proposed approach for overcoming this difficulty within the gastrointestinal tract has been the concept of enterotypes, or more generically, stool community types 4,8,9. In this approach samples are clustered into bins based on their taxonomic similarity. Specific enterotypes have been associated with the amount of protein, fat and carbohydrates in one’s diet, obesity, inflammatory bowel disease, and Crohn’s disease 4,9–11. Others have found associations between specific vaginal community types and the sexually transmitted Trichomonas vaginalis, pH, and ethnicity 12–14 and associations between skin community types and psoriasis 15. Using bacterial community structures collected from 18 body sites and up to three time points, we applied community typing analysis to better understand the factors that affect the structure of the microbiome and contribute to human health.
Concern has been expressed regarding whether community types reflect partitioning of an abundance gradient or the presence of clusters of relative abundance profiles 8. Two general approaches have been developed to assign samples to community types: partitioning around the medoid (PAM) and Dirichlet multinomial mixture (DMM) models 4,8. To compare these methods we first generated simulated communities where there were one or four community types. Analysis of the simulated communities indicated that the negative log model evidence metric used by the DMM-based approach superior to the metrics used to assess clusters within the PAM-based approach (Supplementary Information). Next, we assigned the samples for each body site to community types using both methods. Calculation of the negative log model evidence demonstrated that the community types identified using DMM were superior to those identified using the PAM-based approach (Extended Data Table 2; Extended Data Figure 1). Thus, our analysis of simulated data and the HMP data suggests that the community types represent clusters of community types.
Using the DMM-based approach, we identified between 2 (anterior nares) and 7 (tongue dorsum) community types per body site (see Source Data for community data and DMM fits). As an example, bacteria from stool samples fell into four distinct community types (Fig. 1A). We observed that 63 genera were needed to account for 90% of the difference between a model with a single community type and four community types (see Source Data). Thus, it was not merely the most abundant bacterial population that differentiated the types as has been previously reported (e.g. Bacteroides, Prevotella, or Ruminococcus) 8–10,16; rather, community types were identified based on complex configurations of numerous taxa. In fact, this supports the findings of the original study to identify the concept of enterotypes that the taxa that typify each enterotype represent networks of co-occurring bacterial populations 9. Inspection of the five most important OTUs, which accounted for 54% of the difference, indicated that each community type had a unique profile of relative abundances (Fig. 1B). Community Type A had the highest levels of Bacteroides but lacked of Prevotella and Ruminococcaceae. Similar to Community Type A, Community Type C also lacked Prevotella, but had a lower relative abundance of Bacteroides and had higher levels of Alistipes, Faecalibacterium, and Ruminococcaceae. Community Type D had fewer Bacteroides than Community Types A and C, but had higher levels of Prevotella. Community type B had the fewest Bacteroides and was dominated by a variety of populations affiliated within the Firmicutes. Furthermore, the diversity of the samples assigned to each of the community types indicated that type A had a significantly lower diversity than the other three types (p<0.001). Community types A, C, and D resembled the previously identified Bacteroides, Ruminoccocus and Prevotella enterotypes, respectively 9,10,16. Analysis of the other body sites yielded analogous patterns.
Figure 1. Analysis of stool samples reveals four community types.
(A) Fitting the genera-level relative abundance data from 597 stool samples to Dirichlet multinomial mixture models provided support for four types when using the Laplace approximation to the negative log model evidence. (B) The relative abundance of the most abundant genera in the samples assigned to each of the types (The boxes represent the interquartile range and the error bars represent the 95% confidence interval; NCommunity Type A=221; NCommunity Type B=15; NCommunity Type C=80; NCommunity Type D=281). There were significant associations between stool community types (N=287 unique individuals) with whether the subject was breastfed as an infant (C; median P=1×10−4) and their gender (D; median P=4×10−4).
Using the responses that subjects gave to an extensive survey (summarized in Extended Data Table 1), we identified demographic and life history characteristics that could be correlated with different community types at each body site. Of the numerous characteristics tested, we observed significant associations between community types and whether the subject was ever breastfed, their gender, and their education level (Source Data). Whether an individual was ever breastfed was strongly associated with their stool community type (P=1×10−4; Fig. 1C). Individuals who were ever breastfed as infants were 2.4-times more likely to belong to community type A and those who were not breastfed were 2.2-times more likely to belong to community type D. Gender was associated with community types identified in the stool (P=4×10−4; Fig. 1D), tongue (P=2×10−3; Extended Data Figure 2A), right retroauricular crease (P=9×10−5; Extended Data Figure 2B), and right antecubital fossa (P=3×10−5; Extended Data Figure 2C). For example, men were 3.0-times more likely than women to harbor stool Community Type D (Fig. 1B). Whether a woman had a baccalaureate degree had a strong association with the community types observed within the vaginal introitus (P=2×10−3; Extended Data Figure 3A), mid vagina (P=8×10−4; Extended Data Figure 3B), and posterior fornix (P=4×10−4; Extended Data Figure 3C). At each of these sites, women with a baccalaureate degree were more likely to be dominated by Lactobacillus (type E) and those without a baccalaureate degree were likely to have very low levels of Lactobacillus and moderate abundances of Atopobium, Prevotella, Bifidobacterium, and unclassified members of the Firmicutes (type D). Together, our analysis indicates that an individual’s life-history characteristics can be associated with their microbiome composition.
The second important observation that we identified was that the community type at one body site was predictive of the community type at another body site. Previously, cross-body site comparisons were made by calculating the ecological distance between samples collected at different body sites based on the taxonomic composition of those communities 3. Our approach allowed us to identify similar associations within a body region (e.g. oral, skin, vagina), but also allowed us to detect associations between communities that had very different taxonomic compositions. Community type membership was correlated among sites within the oral cavity, in the vagina, and between the left and right antecubital fossa and the left and right retroauricular crease (Fig. 2). Surprisingly, stool samples showed a significant association with samples from within the oral cavity; the strongest association was with the community types observed in saliva (P=10−3; Extended Data Table 3). Saliva was dominated by members of the Prevotella, Streptococcus, Pasteurellaceae, Veillonella, and Fusobacterium; among these taxa, only Prevotella were abundant in the stool communities. Individuals with stool community type D, which had the highest level of Prevotella, were 2.1-times more likely to harbor saliva community types A and C, which were also high in Prevotella relative to saliva community types B and D. Stool community types A and C, which had low levels of Prevotella, were less likely to co-occur with saliva community types A and C (Extended Data Table 3). These results are intriguing because they suggest that although the oral and stool communities shared little taxonomic resemblance, oral bacterial populations seed the gut, and those populations experience the ecological milieu of the gut to give rise to consistent community types by the time they reach the stool.
Figure 2. Community-type associations are strongest within a body region, but also exist between stool and the oral cavity.
Heatmap colors represent the magnitude of the median P-value for the comparison of community type membership using Fisher’s Exact Test. Median P-values are found in the Source Data.
Aside from life history characteristics and inoculation from other body sites, the structure of the human microbiome is likely shaped by an individual’s recent interactions with their environment, diet, medications, and overall health. We quantified the stability of each community type at every body site by estimating the probability that the type would change between sampling visits (Figs. 3A). The most stable body sites were in the stool and vagina and the least stable site was the supragingival plaque. Among the four stool community types, type D was the most stable followed by types A, C, and B (Fig. 3B). Unfortunately, the metadata describing changes in health or lifestyle are unable to provide us with an explanation for why community types change.
Figure 3. Dynamics of community types at various body sites suggests that community type stability is correlated with the diversity of the community type.
(A) The community types at each body site differ in the fraction of samples that change their community type membership between visits. (Size of circles represents percentage of samples that affiliated with each community type and the vertical line represents the weighted average). (B) Rate of change between stool community types (NCommunity Type A=221; NCommunity Type B=15; NCommunity Type C=80; NCommunity Type D=281). The numbers on directed edges indicate the percentage of samples that changed community types.
The human microbiome is a complex ecosystem that varies considerably across the body and between individuals. This study demonstrates that given the myriad permutations of genetics, life histories, behaviors, environments, and exposures, an individual’s microbiome is an emergent property whereby a potentially limitless number of microbial community structures can be distilled into a finite number of types. Knowledge of the factors that affect one’s community type profile will be critical as they continue to be associated with predisposition to diseases. Furthermore, understanding why community types change will be useful in developing therapies that can alter one’s community type using pre- and probiotics, fecal transplants, or antibiotics. Given the varying levels of flux between community types at different body sites, it is remarkable that we were still able to detect life-long legacy effects on the microbiome, such as whether the subject was ever breastfed as an infant. This result could represent a true long-term impact of breastfeeding on the microbiome or it could represent the effect of the individual’s childhood environment or care. The result raises the possibility that there may be other legacy effects on the microbiome, such as duration of breastfeeding, mode of birth, level of early antibiotic exposure, and childhood disease 17–19. The four gender-based associations are intriguing and support previous studies showing that men and women have different skin communities 20 and that autoimmune diseases may be mediated via the microbiome and hormonal differences 21. The association between one’s level of education and their vaginal microbiome type is less clear; it is most likely that a baccalaureate degree represents a composite variable of numerous factors known to affect the vaginal microbiome, including race/ethnicity, sexual behavior, and socioeconomic class. Regardless, that such considerable variation was observed among a population of healthy women supports the observation that there is no single normal vaginal microbiome 22; this is likely true for every body site. Looking forward, prospective studies including individuals with varying levels of health and background to be more representative of society are needed to better understand the mechanisms of change in community types as well as to flesh out correlations between community type and life-history factors such as genetics, age, diet, health status, and environment (i.e. rural or urban). Furthermore, future prospective studies with a longitudinal component need to control for the time between samplings and perhaps sync sampling with host physiology (e.g. menses). Perhaps most exciting is the prospect that community types may be associated with complex diseases such as bacterial vaginosis, periodontitis, cancer, and diabetes where it has not been possible to establish a causative relationship between a specific bacterium and the disease.
Methods (online only)
Sequence analysis pipeline
The Human Microbiome Project carried out three phases of sequencing the 16S rRNA gene, which were performed using the 454 Titanium sequencing platform. We obtained the unprocessed sff files for the V35 region from the NCBI Short Read Archive (SRA) for each of these phases: the Clinical Pilot Project (accession SRP002012), Phase I (accession SRP002395), and Phase II (accession SRP002860). The Clinical Pilot Project and Phase I datasets have been described previously 2,3. The sequencing was performed by sequencing from the 3’ to the 5’ end of the 16S rRNA gene 25. Although the V13 and V69 regions were also sequenced by the HMP sequencing centers, the number of datasets generated for those regions was considerably smaller than was obtained for the V35 region. The 16S rRNA gene sequence curation pipeline was implemented using the mothur software package 23,24. This approach has been shown to result in a sequencing error rate of 0.02% 24. Briefly, flowgrams were extracted from the sff files and any that had more than one mismatch to the barcode, more than two mismatches to the primer, had fewer than 450 flows, contained homopolymers longer than 8 nt, or contained an ambiguous base call were culled. The flows for each sequencing run were trimmed to 450 flows and denoised separately using the PyroNoise algorithm as implemented within mothur 26. The denoised sequences were then aligned against a customized reference alignment based on the SILVA database using the NAST algorithm implemented within mothur 27. The customized database included small subunit rRNA sequences from bacteria, archaea, eukarya, chloroplasts, and mitochondria. Sequences that did not align to the predicted V35 region were culled from further analysis and the alignments were trimmed so that the sequences fully overlapped the same alignment coordinates 28,29. These sequences were then subjected to a pre-clustering step that first sorted the sequences by their abundance within each sample and then clustered sequence abundances together if a sequence was within 2 nt of a more abundant sequence 24. Treating each sample separately, we interrogated each sequence for the presence of chimeras using the de novo UChime chimera detection algorithm 30. Once chimeric sequences were culled from the datasets, the sequences were classified using the naïve Bayesian Classifier trained against a customized version of the RDP training set (version 9) as implemented within mothur 31. The training set was customized by supplementing sequences derived from chloroplasts, mitochondria, and members of the Eukarya. The reference sequences were trimmed to only include the V35 region of the 16S rRNA gene. We required a minimum classification confidence score of 80% and utilized 1,000 pseudo-bootstrap iterations. Because the PCR target was bacterial 16S rRNA gene sequences, we culled any sequences that classified as being derived from archaea, eukarya, mitochondria, chloroplasts or sequences that could not be classified to a kingdom with at least 80% confidence. The taxonomy of the remaining sequences was used to assign the sequences to genus-level phylotypes. Those sequences without a genus-level classification were assigned to a phylotype represented by the lowest level taxonomy with a confidence score of at least 80%. This allowed us to create a table of counts for the number of times each genus-level phylotype was observed in each sample. As some samples were sequenced multiple times to obtain additional sequence data, we pooled replicate sequencing runs to create a single sample. Samples with fewer than 1,000 reads were removed from further analysis and all samples were either sub-sampled or rarefied (N=1,000 iterations) to 1,000 reads to perform subsequent analyses. Sub-sampling and rarefaction was necessary to limit the effects of differential sampling that are known to affect alpha and beta diversity metrics and differentially increase the representation of PCR and sequencing artifacts in datasets.
Assignment to community types
The table of counts was partitioned according to the 18 body sites. Because the communities were similar and we wanted to use the maximum number of samples per body site when assigning samples to community types, we pooled the three vaginal body sites (i.e. vaginal introitus, mid-vagina, and posterior fornix) into one vaginal dataset, the two antecubital fossa sites (i.e. left and right) into one antecubital fossa dataset, and the two retroauricular crease sites (i.e. left and right) into one retroauricular crease dataset. The resulting 14 tables were used as input to partition the samples according to community types at each body site using the Dirichlet multinomial mixture model 4. We selected the number of community types at each body site by selecting the number of components that gave the minimum Laplace approximation to the negative log model evidence. Samples were assigned to their community type based on the maximum posterior probability. For all body sites, between 89.2 and 99.7% of the samples had a posterior probability of at least 0.90. The mean abundance and 95% confidence interval predicted by the model are provided for each body site.
Selection of metadata
A large amount of metadata and clinical data were collected for each of the samples and subjects 5. We obtained the most recent version of these data from dbGap (accession phs000228.v3.p1). Because of the uniformity and healthy nature of the cohort, a number of the clinical data fields could not be included in our analysis. Furthermore, there was evidence that several variables were collected from subjects in one city but not the other (see Supplementary Information for a discussion of the difficulties in analyzing the city of origin data). We interrogated, a priori, the categorical metadata to identify those variables where we were able to identify at least 10 instances of the condition and that was represented in the subjects from both cities. In addition, to increase the number of variables under consideration, we pooled responses. For example, there were 13 categories for country of birth with only one of those having more than 10 respondents (US/Canda; N=260). In this case we pooled the other responses to create a non-US/Canada group (N=40). We used a similar pooling strategy for parents’ country of origin, meat eaters/vegetarians, number of deliveries, occupation, and level of education. The data available through dbGap partitioned medications into broad categories and indicated whether the subject was using the medication at the time of the visit. This created three classes of subjects. The first never used the medication during the study, the second class used the medication for one or two of the visits, but not all of their samples, and the third class used the medication for all of their visits. For the purpose of correlating medication usage with community type, we utilized the data for the first and third classes of subjects and ignored the second. The number of subjects in the second class was below 10 for each type of medication. For example, there were only 8 subjects that had more than one visit and used antibiotics within 30 days of any of their visits. Because of the general paucity of subjects in this category of medication users, it was not possible to associate medication usage with changes in community type. Finally, we converted the subjects’ body mass index (BMI) into categories of normal (18.5–25), overweight (25–30), and obese (>30); there were no underweight subjects (<18.5). The resulting list of categorical clinical data that were considered is provided in Extended Data Table 1. In addition to these categorical data, we also had access to continuous clinical metadata for each of the subjects. These included their age, BMI, pulse, and blood pressure. A summary of these data is provided in Extended Data Table 1.
Tests of association
Because individuals provided up to three samples it would have been arbitrary to select one visit from each subject (e.g. the first visit) on which to base our analyses. For the categorical metadata associations, we performed an iterative procedure where we selected a single visit for each individual’s body site and tested the association between the community type at the body site and the metadata using Fisher’s exact test. For the continuous metadata we performed a similar procedure except we tested the association between the community type at the body site and the metadata using analysis of variance. Finally, to test associations across the body we performed a randomization procedure where each iteration consisted of selecting one visit for each individual and then testing for inter-body site associations using the Fisher exact test. We performed 1,000 iterations and we calculated the percentage of iterations that each variable was significant according to the Benjamini–Hochberg step-up procedure that we used to limit the false discovery rate to 5%. We report the median P-value and the percentage of iterations that were significant.
Extended Data
Extended Data Figure 1. Comparison of community type assignments for non-metric dimensional scaling (NMDS) ordination of Jensen-Shannon divergence values between stool samples using DMM (A) and PAM-based clustering (B).
The stress computed for this ordination was 0.19 and the R2 between the input distance matrix and the distance matrix calculated between the points in the ordination was 0.90.
Extended Data Figure 2.
Percentage of female and male tongue communities that affiliated with each of the tongue (A; N=288 unique individuals; median P=2×10−3), right retroauricular crease (B; N=268 unique individuals; median P=9×10−5), and right antecubital fossa community types (C; N=136 unique individuals; median P=3×10−5).
Extended Data Figure 3.
Percentage of women with and without a college degree whose vaginal communities affiliated with the vaginal introitus (A; N=74 unique individuals; median P=2×10−3), mid-vagina (B; N=64 unique individuals; median P=8×10−4), and posterior fornix (C; N=61 unique individuals; median P=4×10−4) community types.
Extended Data Table 1. Most common characteristics of the individuals included in the HMP healthy cohort.
Categorical data | Number of Individuals (Total=300) |
---|---|
Sampled in Houston / St. Louis | 150 / 150 |
Female / Male | 151 / 149 |
Born in US or Canada | 260 |
Mother born in US or Canada | 227 |
Father born in US or Canada | 229 |
Hispanic, Latino, or Spanish | 32 |
Asian | 34 |
Black | 21 |
White | 243 |
Ever breastfed as infant | 198 (Forgot=33, NA=23) |
Eats meat at least once a week | 261 (NA=23) |
Occupation: Student | 150 |
Had given birth at least once | 26 (NA=156) |
College educated | 210 (NA=16) |
Had dental insurance | 265 (NA=16) |
Had health insurance | 217 (NA=16) |
Tobacco user | 19 |
Chronic use of antidepressants | 22 (Tr=9) |
Chronic use of antihistamines | 9 (Tr=13) |
Chronic use of hormonal contraceptives | 72 (Tr=15; NA=149) |
Chronic use of vitamins or supplements | 34 (Tr=10) |
Normal BMI | 178 |
Overweight BMI | 88 |
Obese BMI | 34 |
Continuous data | Median (Min-Max) |
Age | 25 (18–40) |
BMI | 24 (19–34) |
Pulse | 71 (42–100) |
Diastolic pressure | 71 (50–98) |
Systolic pressure | 119 (91–151) |
pH – Vaginal introitus | 4.4 (3.3–6.5) |
pH – Posterior fornix | 4.0 (3.2–7.0) |
Extended Data Table 2. Comparison of PAM and DMM-based approaches to assigning samples to community types.
PAM-based using SI Index | PAM-based using CH Index | DMM-based | ||||||
---|---|---|---|---|---|---|---|---|
Body site | Clusters | SI Index | Laplace | Clusters | CH Index | Laplace | Clusters | Laplace |
Antecubital fossa | 2 | 0.34 | 84858.4 | 2 | 114.3 | 84858.4 | 3 | 83302.1 |
Anterior nares | 3 | 0.32 | 52136.3 | 2 | 153.5 | 51864.3 | 2 | 51532.0 |
Buccal mucosa | 2 | 0.23 | 65643.3 | 4 | 166.2 | 64968.1 | 4 | 64588.8 |
Hard palate | 2 | 0.28 | 72686.9 | 4 | 208.4 | 71573.8 | 4 | 71436.9 |
Keratinized gingiva | 2 | 0.38 | 51392.2 | 2 | 323.8 | 51392.2 | 5 | 50605.3 |
Palatine tonsils | 2 | 0.27 | 82655.3 | 2 | 237.7 | 82655.3 | 6 | 81446.7 |
Retroauricular crease | 3 | 0.51 | 95797.5 | 3 | 719.9 | 95797.5 | 5 | 94673.5 |
Saliva | 2 | 0.19 | 81261.3 | 2 | 120.4 | 81261.3 | 4 | 80656.1 |
Stool | 2 | 0.40 | 76228.5 | 2 | 194.0 | 76228.5 | 4 | 74785.6 |
Subgingival plaque | 2 | 0.25 | 90876.7 | 2 | 249.4 | 90876.7 | 5 | 89672.2 |
Supragingival plaque | 2 | 0.24 | 78982.0 | 2 | 217.6 | 78982.0 | 6 | 78357.1 |
Throat | 2 | 0.26 | 79238.0 | 2 | 177.8 | 79238.0 | 5 | 78052.8 |
Tongue dorsum | 2 | 0.33 | 71442.3 | 2 | 293.2 | 71442.3 | 7 | 69923.0 |
Vagina | 10 | 0.57 | 32407.3 | 2 | 205.7 | 32150.8 | 5 | 31209.5 |
Extended Data Table 3. Average contingency table of stool and saliva community types.
The median P-value from a Fisher’s Exact Test was 1×10−3.
Saliva A | Saliva B | Saliva C | Saliva D | |
---|---|---|---|---|
Stool A | 0.101 | 0.140 | 0.104 | 0.044 |
Stool B | 0.003 | 0.000 | 0.002 | 0.021 |
Stool C | 0.136 | 0.173 | 0.107 | 0.027 |
Stool D | 0.048 | 0.024 | 0.052 | 0.017 |
Supplementary Material
Source Data Figure 1
Source Data Figure 2
Source Data Figure 3
Acknowledgements
We thank Jonathan Crabtree of the HMP Data Analysis and Coordination Center for his assistance in obtaining the sequencing and metadata files. The analysis described in this study was supported by grants from the National Institutes of Health (R01HG005975, R01GM099514, and P30DK034933).
Footnotes
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions. TD and PDS designed and executed the analysis and prepared the manuscript.
The authors declare no competing financial interests.
Literature cited
- 1.Peterson J, et al. The NIH Human Microbiome Project. Genome Res. 2009;19:2317–2323. doi: 10.1101/gr.096651.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.The Human Microboime Consortium. A framework for human microbiome research. Nature. 2012;486:215–221. doi: 10.1038/nature11209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.The Human Microboime Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–214. doi: 10.1038/nature11234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Holmes I, Harris K, Quince C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE. 2012;7:e30126. doi: 10.1371/journal.pone.0030126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Aagaard K, et al. The Human Microbiome Project strategy for comprehensive sampling of the human microbiome and why it matters. FASEB J. 2013;27:1012–1022. doi: 10.1096/fj.12-220806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Turnbaugh PJ, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–484. doi: 10.1038/nature07540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Costello EK, et al. Bacterial community variation in human body habitats across space and time. Science. 2009;326:1694–1697. doi: 10.1126/science.1177486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Koren O, et al. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS Comput Biol. 2013;9:e1002863. doi: 10.1371/journal.pcbi.1002863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Arumugam M, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–180. doi: 10.1038/nature09944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wu GD, et al. Linking long-term dietary patterns with gut microbial enterotypes. Science. 2011;334:105–108. doi: 10.1126/science.1208344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Quince C, et al. The impact of Crohn's disease genes on healthy human gut microbiota: a pilot study. Gut. 2013;62:952–954. doi: 10.1136/gutjnl-2012-304214. [DOI] [PubMed] [Google Scholar]
- 12.Brotman RM, et al. Association between Trichomonas vaginalis and vaginal bacterial community composition among reproductive-age women. Sex Transm Dis. 2012;39:807–812. doi: 10.1097/OLQ.0b013e3182631c79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gajer P, et al. Temporal dynamics of the human vaginal microbiota. Sci Transl Med. 2012;4:132ra152. doi: 10.1126/scitranslmed.3003605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ravel J, et al. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2011;108(Suppl 1):4680–4687. doi: 10.1073/pnas.1002611107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Statnikov A, et al. Microbiomic signatures of psoriasis: feasibility and methodology comparison. Sci Rep. 2013;3:2620. doi: 10.1038/srep02620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Moeller AH, et al. Chimpanzees and humans harbour compositionally similar gut enterotypes. Nat Commun. 2012;3:1179. doi: 10.1038/ncomms2159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Palmer C, Bik EM, Digiulio DB, Relman DA, Brown PO. Development of the human infant intestinal microbiota. PLoS Biol. 2007;5:e177. doi: 10.1371/journal.pbio.0050177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Koenig JE, et al. Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci U S A. 2011;108(Suppl 1):4578–4585. doi: 10.1073/pnas.1000081107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Pantoja-Feliciano IG, et al. Biphasic assembly of the murine intestinal microbiota during early development. ISME J. 2013;7:1112–1115. doi: 10.1038/ismej.2013.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fierer N, Hamady M, Lauber CL, Knight R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci U S A. 2008;105:17994–17999. doi: 10.1073/pnas.0807920105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Markle JG, et al. Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science. 2013;339:1084–1088. doi: 10.1126/science.1233521. [DOI] [PubMed] [Google Scholar]
- 22.Ma B, Forney LJ, Ravel J. Vaginal microbiome: rethinking health and disease. Annu Rev Microbiol. 2012;66:371–389. doi: 10.1146/annurev-micro-092611-150157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Schloss PD, et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75:7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Schloss PD, Gevers D, Westcott SL. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 2011;6:e27310. doi: 10.1371/journal.pone.0027310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE. 2012;7:e39315. doi: 10.1371/journal.pone.0039315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011;12:38. doi: 10.1186/1471-2105-12-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schloss PD. A high-throughput DNA sequence aligner for microbial ecology studies. PLoS ONE. 2009;4:e8230. doi: 10.1371/journal.pone.0008230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schloss PD. Secondary structure improves OTU assignments of 16S rRNA gene sequences. ISME J. 2013;7:457–460. doi: 10.1038/ismej.2012.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Pruesse E, et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35:7188–7196. doi: 10.1093/nar/gkm864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27:2194–2200. doi: 10.1093/bioinformatics/btr381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Source Data Figure 1
Source Data Figure 2
Source Data Figure 3