A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family example

BMC Genomics. 2005 Nov 4:6:153. doi: 10.1186/1471-2164-6-153.

Abstract

Background: large scale and reliable proteins' functional annotation is a major challenge in modern biology. Phylogenetic analyses have been shown to be important for such tasks. However, up to now, phylogenetic annotation did not take into account expression data (i.e. ESTs, Microarrays, SAGE, ...). Therefore, integrating such data, like ESTs in phylogenetic annotation could be a major advance in post genomic analyses. We developed an approach enabling the combination of expression data and phylogenetic analysis. To illustrate our method, we used an example protein family, the peptidyl arginine deiminases (PADs), probably implied in Rheumatoid Arthritis.

Results: the analysis was performed as follows: we built a phylogeny of PAD proteins from the NCBI's NR protein database. We completed the phylogenetic reconstruction of PADs using an enlarged sequence database containing translations of ESTs contigs. We then extracted all corresponding expression data contained in EST database This analysis allowed us 1/To extend the spectrum of homologs-containing species and to improve the reconstruction of genes' evolutionary history. 2/To deduce an accurate gene expression pattern for each member of this protein family. 3/To show a correlation between paralogous sequences' evolution rate and pattern of tissular expression.

Conclusion: coupling phylogenetic reconstruction and expression data is a promising way of analysis that could be applied to all multigenic families to investigate the relationship between molecular and transcriptional evolution and to improve functional annotation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arthritis, Rheumatoid / genetics
  • Computational Biology
  • Contig Mapping
  • Databases, Genetic
  • Databases, Protein
  • Evolution, Molecular
  • Expressed Sequence Tags
  • Gene Expression
  • Gene Expression Regulation*
  • Gene Library
  • Genome
  • Genome, Human
  • Genomics
  • Humans
  • Hydrolases / chemistry
  • Hydrolases / genetics*
  • Mice
  • Models, Statistical
  • Multigene Family
  • Oligonucleotide Array Sequence Analysis
  • Phylogeny
  • Protein-Arginine Deiminases
  • Proteins
  • Tissue Distribution
  • Transcription, Genetic

Substances

  • Proteins
  • Hydrolases
  • Protein-Arginine Deiminases