pubmed.ncbi.nlm.nih.gov

Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life - PubMed

️Fri Jan 01 2010

Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life

Chris Todd Hittinger et al. Proc Natl Acad Sci U S A. 2010.

Abstract

Assembling the tree of life is a major goal of biology, but progress has been hindered by the difficulty and expense of obtaining the orthologous DNA required for accurate and fully resolved phylogenies. Next-generation DNA sequencing technologies promise to accelerate progress, but sequencing the genomes of hundreds of thousands of eukaryotic species remains impractical. Eukaryotic transcriptomes, which are smaller than genomes and biased toward highly expressed genes that tend to be conserved, could potentially provide a rich set of phylogenetic characters. We sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes. Analysis of these data matrices yielded robust phylogenetic inferences, even with data matrices constructed from surprisingly few sequence reads. This approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods and provides a scalable strategy for generating phylogenomic data matrices to infer the branches and twigs of the tree of life.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Robust phylogenetic inference from short-read next-generation DNA sequencing. (A) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using *A. aegypti* full transcripts as references under the single-contig strategy. (B) ML phylogeny of data matrix analyzed in A after exclusion of all loci of ambiguous orthology under either assessment strategy. (C) ML phylogeny of data matrix analyzed in A after exclusion of all sites with any missing data or gaps. (D) ML phylogeny produced from data matrix constructed by considering all contigs ≥100 bp assembled from ∼13 million sequence reads per species using *A. gambiae* full transcripts as references. (E–H) The same analyses as in A–D but on data matrices constructed by considering all contigs ≥300 bp. Clade support near internodes represents bootstrap support (ML) and posterior probability (Bayesian inference), respectively. Asterisks denote absolute support. Branch lengths represent estimated substitutions per site.

**Fig. 2.**
Constructed phylogenomic data matrices contain large amounts of orthologous DNA and are capable of yielding robust phylogenetic inferences even after substantial reductions in the amount of input data. (A) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the single-contig strategy with contigs ≥100 bp. (B) Number of resolved internodes in data matrices constructed using the single-contig strategy. (C) Numbers of total, variable, and parsimony-informative sites in data matrices constructed from different amounts of raw data using the supercontig strategy with contigs ≥100 bp. (D) Number of resolved internodes in data matrices constructed using the supercontig strategy.

**Fig. 3.**
Base pairs found in phylogenetic data matrices are derived from highly expressed transcripts, especially in data sets constructed from less input data. Expression is plotted against the number of sequence reads used. The average expression of a base pair included in a given supercontig data matrix (contigs ≥100 bp) was quantified from the *A. aegypti* data relative to the average expression of a base pair in the full *A. aegypti* transcriptome.

Cited by

Nuclear genomic signals of the 'microturbellarian' roots of platyhelminth evolutionary innovation.
Laumer CE, Hejnol A, Giribet G. Laumer CE, et al. Elife. 2015 Mar 12;4:e05503. doi: 10.7554/eLife.05503. Elife. 2015. PMID: 25764302 Free PMC article.
Integrating multi-origin expression data improves the resolution of deep phylogeny of ray-finned fish (Actinopterygii).
Zou M, Guo B, Tao W, Arratia G, He S. Zou M, et al. Sci Rep. 2012;2:665. doi: 10.1038/srep00665. Epub 2012 Sep 18. Sci Rep. 2012. PMID: 22993690 Free PMC article.
Multigene phylogenetics reveals temporal diversification of major African malaria vectors.
Kamali M, Marek PE, Peery A, Antonio-Nkondjio C, Ndo C, Tu Z, Simard F, Sharakhov IV. Kamali M, et al. PLoS One. 2014 Apr 4;9(4):e93580. doi: 10.1371/journal.pone.0093580. eCollection 2014. PLoS One. 2014. PMID: 24705448 Free PMC article.
Can deliberately incomplete gene sample augmentation improve a phylogeny estimate for the advanced moths and butterflies (Hexapoda: Lepidoptera)?
Cho S, Zwick A, Regier JC, Mitter C, Cummings MP, Yao J, Du Z, Zhao H, Kawahara AY, Weller S, Davis DR, Baixeras J, Brown JW, Parr C. Cho S, et al. Syst Biol. 2011 Dec;60(6):782-96. doi: 10.1093/sysbio/syr079. Epub 2011 Aug 16. Syst Biol. 2011. PMID: 21840842 Free PMC article.
Polyploid evolution of the Brassicaceae during the Cenozoic era.
Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, Kessler D, Clarke WE, Edger PP, Links MG, Sharpe AG, Parkin IA. Kagale S, et al. Plant Cell. 2014 Jul;26(7):2777-91. doi: 10.1105/tpc.114.126391. Epub 2014 Jul 17. Plant Cell. 2014. PMID: 25035408 Free PMC article.

References

1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2009;37(Database issue):D26–D31. - PMC - PubMed
1. Sanderson MJ. Phylogenetic signal in the eukaryotic tree of life. Science. 2008;321:121–123. - PubMed
1. Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804. - PubMed
1. Naylor GJP, Brown WM. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998;47:61–76. - PubMed
1. Cummings MP, Otto SP, Wakeley J. Sampling properties of DNA sequence data in phylogenetic analysis. Mol Biol Evol. 1995;12:814–822. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life - PubMed