In statistical phylogenetic analyses of DNA sequences, models of evolutionary change commonly assume that base composition is stationary through time and across lineages. This assumption is violated by many data sets, but it is unclear whether the magnitude of these violations is sufficient to mislead phylogenetic inference. We investigated the impacts of compositional heterogeneity on phylogenetic estimates using a method for assessing model adequacy. Based on a detailed simulation study, we found that common frequentist criteria are highly conservative, such that the model is often rejected when the phylogenetic estimates do not show clear signs of bias. We propose new criteria and provide guidelines for their usage. We apply these criteria to genome-scale data from 40 birds and find that loci with severely non-homogeneous base composition are uncommon. Our results show the importance of using well-informed diagnostic statistics when testing model adequacy for phylogenomic analyses.
Keywords: chi-squared statistic test; compositional heterogeneity; model adequacy; phylogenetic inference; predictive simulations; substitution models.
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.