pubmed.ncbi.nlm.nih.gov

Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life - PubMed

  • ️Fri Jan 01 2010

Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life

Chris Todd Hittinger et al. Proc Natl Acad Sci U S A. 2010.

Abstract

Assembling the tree of life is a major goal of biology, but progress has been hindered by the difficulty and expense of obtaining the orthologous DNA required for accurate and fully resolved phylogenies. Next-generation DNA sequencing technologies promise to accelerate progress, but sequencing the genomes of hundreds of thousands of eukaryotic species remains impractical. Eukaryotic transcriptomes, which are smaller than genomes and biased toward highly expressed genes that tend to be conserved, could potentially provide a rich set of phylogenetic characters. We sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes. Analysis of these data matrices yielded robust phylogenetic inferences, even with data matrices constructed from surprisingly few sequence reads. This approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods and provides a scalable strategy for generating phylogenomic data matrices to infer the branches and twigs of the tree of life.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.

Robust phylogenetic inference from short-read next-generation DNA sequencing. (A) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using A. aegypti full transcripts as references under the single-contig strategy. (B) ML phylogeny of data matrix analyzed in A after exclusion of all loci of ambiguous orthology under either assessment strategy. (C) ML phylogeny of data matrix analyzed in A after exclusion of all sites with any missing data or gaps. (D) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using A. gambiae full transcripts as references. (EH) The same analyses as in AD but on data matrices constructed by considering all contigs ≥300 bp. Clade support near internodes represents bootstrap support (ML) and posterior probability (Bayesian inference), respectively. Asterisks denote absolute support. Branch lengths represent estimated substitutions per site.

Fig. 2.
Fig. 2.

Constructed phylogenomic data matrices contain large amounts of orthologous DNA and are capable of yielding robust phylogenetic inferences even after substantial reductions in the amount of input data. (A) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the single-contig strategy with contigs ≥100 bp. (B) Number of resolved internodes in data matrices constructed using the single-contig strategy. (C) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the supercontig strategy with contigs ≥100 bp. (D) Number of resolved internodes in data matrices constructed using the supercontig strategy.

Fig. 3.
Fig. 3.

Base pairs found in phylogenetic data matrices are derived from highly expressed transcripts, especially in data sets constructed from less input data. Expression is plotted against the number of sequence reads used. The average expression of a base pair included in a given supercontig data matrix (contigs ≥100 bp) was quantified from the A. aegypti data relative to the average expression of a base pair in the full A. aegypti transcriptome.

Similar articles

Cited by

References

    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2009;37(Database issue):D26–D31. - PMC - PubMed
    1. Sanderson MJ. Phylogenetic signal in the eukaryotic tree of life. Science. 2008;321:121–123. - PubMed
    1. Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804. - PubMed
    1. Naylor GJP, Brown WM. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998;47:61–76. - PubMed
    1. Cummings MP, Otto SP, Wakeley J. Sampling properties of DNA sequence data in phylogenetic analysis. Mol Biol Evol. 1995;12:814–822. - PubMed

Publication types

MeSH terms