Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life - PubMed
- ️Fri Jan 01 2010
Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life
Chris Todd Hittinger et al. Proc Natl Acad Sci U S A. 2010.
Abstract
Assembling the tree of life is a major goal of biology, but progress has been hindered by the difficulty and expense of obtaining the orthologous DNA required for accurate and fully resolved phylogenies. Next-generation DNA sequencing technologies promise to accelerate progress, but sequencing the genomes of hundreds of thousands of eukaryotic species remains impractical. Eukaryotic transcriptomes, which are smaller than genomes and biased toward highly expressed genes that tend to be conserved, could potentially provide a rich set of phylogenetic characters. We sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes. Analysis of these data matrices yielded robust phylogenetic inferences, even with data matrices constructed from surprisingly few sequence reads. This approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods and provides a scalable strategy for generating phylogenomic data matrices to infer the branches and twigs of the tree of life.
Conflict of interest statement
The authors declare no conflict of interest.
Figures

Robust phylogenetic inference from short-read next-generation DNA sequencing. (A) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using A. aegypti full transcripts as references under the single-contig strategy. (B) ML phylogeny of data matrix analyzed in A after exclusion of all loci of ambiguous orthology under either assessment strategy. (C) ML phylogeny of data matrix analyzed in A after exclusion of all sites with any missing data or gaps. (D) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using A. gambiae full transcripts as references. (E–H) The same analyses as in A–D but on data matrices constructed by considering all contigs ≥300 bp. Clade support near internodes represents bootstrap support (ML) and posterior probability (Bayesian inference), respectively. Asterisks denote absolute support. Branch lengths represent estimated substitutions per site.

Constructed phylogenomic data matrices contain large amounts of orthologous DNA and are capable of yielding robust phylogenetic inferences even after substantial reductions in the amount of input data. (A) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the single-contig strategy with contigs ≥100 bp. (B) Number of resolved internodes in data matrices constructed using the single-contig strategy. (C) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the supercontig strategy with contigs ≥100 bp. (D) Number of resolved internodes in data matrices constructed using the supercontig strategy.

Base pairs found in phylogenetic data matrices are derived from highly expressed transcripts, especially in data sets constructed from less input data. Expression is plotted against the number of sequence reads used. The average expression of a base pair included in a given supercontig data matrix (contigs ≥100 bp) was quantified from the A. aegypti data relative to the average expression of a base pair in the full A. aegypti transcriptome.
Similar articles
-
Efficient Detection of Novel Nuclear Markers for Brassicaceae by Transcriptome Sequencing.
Stockenhuber R, Zoller S, Shimizu-Inatsugi R, Gugerli F, Shimizu KK, Widmer A, Fischer MC. Stockenhuber R, et al. PLoS One. 2015 Jun 10;10(6):e0128181. doi: 10.1371/journal.pone.0128181. eCollection 2015. PLoS One. 2015. PMID: 26061739 Free PMC article.
-
Brooks MJ, Rajasimha HK, Roger JE, Swaroop A. Brooks MJ, et al. Mol Vis. 2011;17:3034-54. Epub 2011 Nov 23. Mol Vis. 2011. PMID: 22162623 Free PMC article.
-
RNA-Seq: a method for comprehensive transcriptome analysis.
Nagalakshmi U, Waern K, Snyder M. Nagalakshmi U, et al. Curr Protoc Mol Biol. 2010 Jan;Chapter 4:Unit 4.11.1-13. doi: 10.1002/0471142727.mb0411s89. Curr Protoc Mol Biol. 2010. PMID: 20069539
-
RNA-Seq: a revolutionary tool for transcriptomics.
Wang Z, Gerstein M, Snyder M. Wang Z, et al. Nat Rev Genet. 2009 Jan;10(1):57-63. doi: 10.1038/nrg2484. Nat Rev Genet. 2009. PMID: 19015660 Free PMC article. Review.
-
Putting the genome in insect phylogenomics.
Johnson KP. Johnson KP. Curr Opin Insect Sci. 2019 Dec;36:111-117. doi: 10.1016/j.cois.2019.08.002. Epub 2019 Aug 13. Curr Opin Insect Sci. 2019. PMID: 31546095 Review.
Cited by
-
Nuclear genomic signals of the 'microturbellarian' roots of platyhelminth evolutionary innovation.
Laumer CE, Hejnol A, Giribet G. Laumer CE, et al. Elife. 2015 Mar 12;4:e05503. doi: 10.7554/eLife.05503. Elife. 2015. PMID: 25764302 Free PMC article.
-
Zou M, Guo B, Tao W, Arratia G, He S. Zou M, et al. Sci Rep. 2012;2:665. doi: 10.1038/srep00665. Epub 2012 Sep 18. Sci Rep. 2012. PMID: 22993690 Free PMC article.
-
Multigene phylogenetics reveals temporal diversification of major African malaria vectors.
Kamali M, Marek PE, Peery A, Antonio-Nkondjio C, Ndo C, Tu Z, Simard F, Sharakhov IV. Kamali M, et al. PLoS One. 2014 Apr 4;9(4):e93580. doi: 10.1371/journal.pone.0093580. eCollection 2014. PLoS One. 2014. PMID: 24705448 Free PMC article.
-
Cho S, Zwick A, Regier JC, Mitter C, Cummings MP, Yao J, Du Z, Zhao H, Kawahara AY, Weller S, Davis DR, Baixeras J, Brown JW, Parr C. Cho S, et al. Syst Biol. 2011 Dec;60(6):782-96. doi: 10.1093/sysbio/syr079. Epub 2011 Aug 16. Syst Biol. 2011. PMID: 21840842 Free PMC article.
-
Polyploid evolution of the Brassicaceae during the Cenozoic era.
Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, Kessler D, Clarke WE, Edger PP, Links MG, Sharpe AG, Parkin IA. Kagale S, et al. Plant Cell. 2014 Jul;26(7):2777-91. doi: 10.1105/tpc.114.126391. Epub 2014 Jul 17. Plant Cell. 2014. PMID: 25035408 Free PMC article.
References
-
- Sanderson MJ. Phylogenetic signal in the eukaryotic tree of life. Science. 2008;321:121–123. - PubMed
-
- Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804. - PubMed
-
- Naylor GJP, Brown WM. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998;47:61–76. - PubMed
-
- Cummings MP, Otto SP, Wakeley J. Sampling properties of DNA sequence data in phylogenetic analysis. Mol Biol Evol. 1995;12:814–822. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Molecular Biology Databases
Research Materials