Proteins adapt to novel environments and/or gain function by substitution in amino acid sequences. Therefore, mutations in protein-coding genes are subject to selection pressure. The strength and character of selection pressure may vary among the regions of the protein. Thus, the spatial distribution of selection pressure provides information on the adaptive evolution of the protein. We developed a hierarchical Bayesian model that detects the spatial distribution of selection pressure on a protein. We expressed selection pressure by the substitution rate ratio of nonsynonymous to synonymous substitutions in the DNA sequence. The Potts model describes the prior distribution of spatial aggregation of selection pressure. The hyperparameters that define the strength and range of spatial clustering are estimated by maximizing the marginal likelihood. Because our prior distribution is un-normalized, we calculated the log marginal likelihood by "thermodynamic integration." We applied the method to historical data on the influenza hemagglutinin protein, comparing the estimated spatial distribution of the substitution rate ratio with that of antigenic sites A-E. The amino acid residues with higher substitution rate ratios, representing diversifying selection pressure, overlapped the antigenic sites.
Keywords: Potts model; hierarchical Bayesian model; molecular evolution; selection pressure; spatial distribution.