pubmed.ncbi.nlm.nih.gov

Ultrafast approximation for phylogenetic bootstrap - PubMed

Ultrafast approximation for phylogenetic bootstrap

Bui Quang Minh et al. Mol Biol Evol. 2013 May.

Abstract

Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up the bootstrap. Here, we suggest an ultrafast bootstrap approximation approach (UFBoot) to compute the support of phylogenetic groups in maximum likelihood (ML) based trees. To achieve this, we combine the resampling estimated log-likelihood method with a simple but effective collection scheme of candidate trees. We also propose a stopping rule that assesses the convergence of branch support values to automatically determine when to stop collecting candidate trees. UFBoot achieves a median speed up of 3.1 (range: 0.66-33.3) to 10.2 (range: 1.32-41.4) compared with RAxML RBS for real DNA and amino acid alignments, respectively. Moreover, our extensive simulations show that UFBoot is robust against moderate model violations and the support values obtained appear to be relatively unbiased compared with the conservative standard bootstrap. This provides a more direct interpretation of the bootstrap support. We offer an efficient and easy-to-use software (available at http://www.cibiv.at/software/iqtree) to perform the UFBoot analysis with ML tree inference.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.

Accuracies of SBS, RBS with RAxML, SH-aLRT with PhyML, and UFBoot approximation from the Yule–Harding (left panel) and the PANDIT-based simulations (right panel).

F<sc>ig</sc>. 2.
Fig. 2.

Impact of moderate (JC + formula image) and severe model violations (JC) on the accuracies of SBS, SH-aLRT, and UFBoot in the PANDIT-based simulations.

F<sc>ig</sc>. 3.
Fig. 3.

Distributions of run-time ratios (log2-scale) between RBS and UFBoot for 300 DNA and AA PANDIT alignments. The percentages of alignments where UFBoot runs slower (left from the dashed line) or faster (right from the dashed line) than RBS are shown.

F<sc>ig</sc>. 4.
Fig. 4.

Schematic view of the tree space sampled by the IQPNNI algorithm. The solid curve reflects the log-likelihood surface on the tree space. The structure of tree space is defined by the NNI operations where each formula image-taxon tree has exactly formula image neighboring trees.

Similar articles

Cited by

References

    1. Adachi J, Hasegawa M. MOLPHY version 2.3—programs for molecular phylogenetics based on maximum likelihood. Minato-ku (Tokyo): Institute of Statistical Mathematics; 1996.
    1. Anisimova M, Gascuel O. Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol. 2006;55:539–552. - PubMed
    1. Anisimova M, Gil M, Dufayard JF, Dessimoz C, Gascuel O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol. 2011;60:685–699. - PMC - PubMed
    1. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27:1164–1165. - PMC - PubMed
    1. Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJP. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol. 2003;20:248–254. - PubMed

Publication types

MeSH terms