Stochastic models of biological evolution generally assume that different characters (runs of the stochastic process) are independent and identically distributed. In this article we determine the asymptotic complexity of detecting dependence for some fairly general models of evolution, but simple models of dependence. A key difference from much of the previous work is that our algorithms work without knowledge of the tree topology. Specifically, we consider various stochastic models of evolution ranging from the common ones used by biologists (such as Cavender-Farris-Neyman and Jukes-Cantor models) to very general ones where evolution of different characters can be governed by different transition matrices on each edge of the evolutionary tree (phylogeny). We also consider several models of dependence between two characters. In the most specific model, on each edge of the phylogeny the joint distribution of the dependent characters undergoes a perturbation of a fixed magnitude, in a fixed direction from what it would be if the characters were evolving independently. More general dependence models don't require such a strong "signal." Instead they only require that on each edge, the perturbation of the joint distribution has a significant component in a specific direction. Our main results are nearly tight bounds on the induced or operator norm of the transition matrices that would allow us to detect dependence efficiently for most models of evolution and dependence that we consider. We make essential use of a new concentration result for multistate random variables of a Markov random field on arbitrary trivalent trees: We show that the random variable counting the number of leaves in any particular state has variance that is subquadratic in the number of leaves.
Keywords: dependent characters; efficient algorithms; evolution; stochastic models.