Reciprocal best match graphs

J Math Biol. 2020 Feb;80(3):865-953. doi: 10.1007/s00285-019-01444-2. Epub 2019 Oct 30.

Abstract

Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.

Keywords: Hierarchically colored cograph; Pairwise best hit; Phylogenetic tree; Reciprocal best match heuristics; Vertex colored graph.

MeSH terms

  • Computational Biology
  • Molecular Sequence Annotation / methods*
  • Phylogeny*
  • Sequence Analysis, DNA / methods*