An ideal bacterial phylogenetic tree accurately retraces evolutionary history and accurately incorporates mutational, recombination and other events on the appropriate branches. Current strain-level bacterial phylogenetic analysis based on large numbers of genomes lacks reliability and resolution, and is hard to be replicated, confirmed and reused, because of the highly divergent nature of microbial genomes. We present SNPs and Recombination Events Tree (SaRTree), a pipeline using six "living trees" modules that addresses problems arising from the high numbers and variable quality of bacterial genome sequences. It provides for reuse of the tree and offers a major step toward global standardization of phylogenetic analysis by generating deposit files including all steps involved in phylogenetic inference. The tree itself is a "living tree" that can be extended by addition of more sequences, or the deposit can be used to vary the programs or parameters used, to assess the effect of such changes. This approach will allow phylogeny papers to meet the traditional responsibility of providing data and analysis that can be repeated and critically evaluated by others. We used the Acinetobacter baumannii global clone I to illustrate use of SaRTree to optimize tree resolution. An Escherichia coli tree was built from 351 sequences selected from 11,162 genome sequences, with the others added back onto well-defined branches, to show how this facility can greatly improve the outcomes from genome sequencing. SaRTree is designed for prokaryote strain-level analysis but could be adapted for other usage.
Keywords: bacteria evolution; high-resolution tree construction; phylogenetic tree.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.