Multivariate autoregressive model for a study of phylogenetic diversity

Gene. 2009 Apr 15;435(1-2):104-18. doi: 10.1016/j.gene.2009.01.009. Epub 2009 Feb 10.

Abstract

We present a computationally effective model to parameterize DNA sequences in a way describing comprehensively its auto and cross-correlation structure. The approach is based on four-channel Multivariate Autoregressive Model (MVAR). The model was applied to a study of genes from the globin family for 6 vertebrate species. First, the sequences were coded as four signals (corresponding to the nucleotides), which were fitted to a four-channel MVAR. From the correlation matrices the vectors of model coefficients were calculated as functions of the nucleotide distance. The between-chromosomes and inter-species differences were best distinguished in the cross-coefficients binding different nucleotide sequences. For clustering purposes different metrics were tested and then two clustering procedures (Nearest Neighbor and UPGMA) were applied. The clustering trees and consensus trees were constructed for exons, introns and whole genes. The results were in agreement with the known dependencies between the chromosomes of the globin family. The orthological genes for different species were grouped together. Inside these groups the phylogenetically close organisms were localized in proximity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Computer Simulation
  • DNA, Mitochondrial
  • Databases, Genetic
  • Genetic Variation / genetics*
  • Models, Genetic
  • Models, Statistical*
  • Multivariate Analysis
  • Phylogeny*

Substances

  • DNA, Mitochondrial