Can deliberately incomplete gene sample augmentation improve a phylogeny estimate for the advanced moths and butterflies (Hexapoda: Lepidoptera)? - PubMed
- ️Mon Nov 18 2148
doi: 10.1093/sysbio/syr079. Epub 2011 Aug 16.
Andreas Zwick, Jerome C Regier, Charles Mitter, Michael P Cummings, Jianxiu Yao, Zaile Du, Hong Zhao, Akito Y Kawahara, Susan Weller, Donald R Davis, Joaquin Baixeras, John W Brown, Cynthia Parr
Affiliations
- PMID: 21840842
- PMCID: PMC3193767
- DOI: 10.1093/sysbio/syr079
Can deliberately incomplete gene sample augmentation improve a phylogeny estimate for the advanced moths and butterflies (Hexapoda: Lepidoptera)?
Soowon Cho et al. Syst Biol. 2011 Dec.
Abstract
This paper addresses the question of whether one can economically improve the robustness of a molecular phylogeny estimate by increasing gene sampling in only a subset of taxa, without having the analysis invalidated by artifacts arising from large blocks of missing data. Our case study stems from an ongoing effort to resolve poorly understood deeper relationships in the large clade Ditrysia ( > 150,000 species) of the insect order Lepidoptera (butterflies and moths). Seeking to remedy the overall weak support for deeper divergences in an initial study based on five nuclear genes (6.6 kb) in 123 exemplars, we nearly tripled the total gene sample (to 26 genes, 18.4 kb) but only in a third (41) of the taxa. The resulting partially augmented data matrix (45% intentionally missing data) consistently increased bootstrap support for groupings previously identified in the five-gene (nearly) complete matrix, while introducing no contradictory groupings of the kind that missing data have been predicted to produce. Our results add to growing evidence that data sets differing substantially in gene and taxon sampling can often be safely and profitably combined. The strongest overall support for nodes above the family level came from including all nucleotide changes, while partitioning sites into sets undergoing mostly nonsynonymous versus mostly synonymous change. In contrast, support for the deepest node for which any persuasive molecular evidence has yet emerged (78-85% bootstrap) was weak or nonexistent unless synonymous change was entirely excluded, a result plausibly attributed to compositional heterogeneity. This node (Gelechioidea + Apoditrysia), tentatively proposed by previous authors on the basis of four morphological synapomorphies, is the first major subset of ditrysian superfamilies to receive strong statistical support in any phylogenetic study. A "more-genes-only" data set (41 taxa×26 genes) also gave strong signal for a second deep grouping (Macrolepidoptera) that was obscured, but not strongly contradicted, in more taxon-rich analyses.
Figures

Diagram of gene and taxon sampling design, showing relationships among the three data sets analyzed. a) Five-gene complete matrix (123 taxa; from Regier et al. 2009). b) Partially augmented matrix, deliberately incomplete, created by adding 21 genes for just 41 of the 123 taxa in the five-gene complete matrix. c) More-genes-only matrix, consisting of just the 41 species sequenced for all 26 genes.

Comparison of ML trees of family relationships inferred from the five-gene complete matrix (left column) to those from the partially augmented matrix (123 taxa×5 or 26 genes; right column), simplified from full 123-taxon trees shown in Figures S1–S4. Black triangles denote families with multiple exemplars. Numbers in parentheses after family name represent number of exemplars for the five-gene complete matrix, number with 26 genes/total number for partially augmented matrix). ### denotes families with one or more exemplars scored for 26 genes for partially augmented matrix. BPs > 50% are shown above branches; number of replicates is 1000 for a) and b), 2000 for c) and d). a) nt123 partitioned, five-gene complete matrix; b) nt123 partitioned, partially augmented matrix; c) all nonsynonymous coding, five-gene complete matrix; d) all nonsynonymous coding, partially augmented matrix.

a)–c) ML trees of family relationships inferred from more-genes-only data set (41 taxa×26 genes), simplified (except in c) from full 41-taxon trees shown in Figures S5 and S6. Black triangles denote families with multiple exemplars; number of exemplars shown in parentheses after family name. a) all nonsynonymous coding, phylogram; b) all nonsynonymous coding, cladogram; c) nt123 partitioned. BPs > 50% are shown above branches; number of replicates is 1000 for a), 2000 for b). d) Relationships among the sampled families (only) according to the morphology-based working hypothesis of Kristensen and Skalski (1998).
Similar articles
-
Regier JC, Zwick A, Cummings MP, Kawahara AY, Cho S, Weller S, Roe A, Baixeras J, Brown JW, Parr C, Davis DR, Epstein M, Hallwachs W, Hausmann A, Janzen DH, Kitching IJ, Solis MA, Yen SH, Bazinet AL, Mitter C. Regier JC, et al. BMC Evol Biol. 2009 Dec 2;9:280. doi: 10.1186/1471-2148-9-280. BMC Evol Biol. 2009. PMID: 19954545 Free PMC article.
-
Kawahara AY, Ohshima I, Kawakita A, Regier JC, Mitter C, Cummings MP, Davis DR, Wagner DL, De Prins J, Lopez-Vaamonde C. Kawahara AY, et al. BMC Evol Biol. 2011 Jun 24;11:182. doi: 10.1186/1471-2148-11-182. BMC Evol Biol. 2011. PMID: 21702958 Free PMC article.
-
Bazinet AL, Cummings MP, Mitter KT, Mitter CW. Bazinet AL, et al. PLoS One. 2013 Dec 4;8(12):e82615. doi: 10.1371/journal.pone.0082615. eCollection 2013. PLoS One. 2013. PMID: 24324810 Free PMC article.
-
Phylogeny and Evolution of Lepidoptera.
Mitter C, Davis DR, Cummings MP. Mitter C, et al. Annu Rev Entomol. 2017 Jan 31;62:265-283. doi: 10.1146/annurev-ento-031616-035125. Epub 2016 Nov 16. Annu Rev Entomol. 2017. PMID: 27860521 Review.
-
Lepidoptera genomes: current knowledge, gaps and future directions.
Triant DA, Cinel SD, Kawahara AY. Triant DA, et al. Curr Opin Insect Sci. 2018 Feb;25:99-105. doi: 10.1016/j.cois.2017.12.004. Epub 2017 Dec 23. Curr Opin Insect Sci. 2018. PMID: 29602369 Review.
Cited by
-
Targeted enrichment: maximizing orthologous gene comparisons across deep evolutionary time.
Hedtke SM, Morgan MJ, Cannatella DC, Hillis DM. Hedtke SM, et al. PLoS One. 2013 Jul 2;8(7):e67908. doi: 10.1371/journal.pone.0067908. Print 2013. PLoS One. 2013. PMID: 23844125 Free PMC article.
-
Wu L, Tong Y, Ayivi SPG, Storey KB, Zhang JY, Yu DN. Wu L, et al. Animals (Basel). 2022 Aug 9;12(16):2015. doi: 10.3390/ani12162015. Animals (Basel). 2022. PMID: 36009607 Free PMC article.
-
Wu LW, Lin LH, Lees DC, Hsu YF. Wu LW, et al. BMC Genomics. 2014 Jun 12;15:468. doi: 10.1186/1471-2164-15-468. BMC Genomics. 2014. PMID: 24923777 Free PMC article.
-
Complete mitochondrial genome of Pterodecta felderi (Lepidoptera: Callidulidae).
Lee KH, Kim MJ, Park JS, Kim I. Lee KH, et al. Mitochondrial DNA B Resour. 2020 Nov 11;5(3):3730-3732. doi: 10.1080/23802359.2020.1833777. Mitochondrial DNA B Resour. 2020. PMID: 33367079 Free PMC article.
-
Regier JC, Brown JW, Mitter C, Baixeras J, Cho S, Cummings MP, Zwick A. Regier JC, et al. PLoS One. 2012;7(4):e35574. doi: 10.1371/journal.pone.0035574. Epub 2012 Apr 19. PLoS One. 2012. PMID: 22536410 Free PMC article.
References
-
- Bazinet AL, Cummings MP. The lattice project: a grid research and production environment combining multiple grid computing models. In: Weber MHW, editor. Distributed and grid computing—science made transparent for everyone. Principles, applications and supporting communities. Marburg (Germany): Tectum Publishing House; 2009. pp. 2–13.
-
- Brower AVZ, DeSalle R. Mitochondrial vs. nuclear DNA sequence evolution among nymphalid butterflies: the utility of wingless as a source of characters for phylogenetic inference. Insect Mol. Biol. 1998;7:1–10. - PubMed
-
- Cummings MP, Huskamp JC. Grid computing. EDUCAUSE Rev. 2005;40:116–117.
-
- de Queiroz A, Gatesy J. The supermatrix approach to systematics. Trends Ecol. Evol. 2007;22:34–41. - PubMed